GlossaryCore GEOUpdated April 15, 2026

AI Indexing

The process by which AI retrieval systems discover, fetch, parse, and embed a website so they can cite it later.

Also known asAI crawlingAI retrieval indexing

AI Indexing is a four-stage pipeline. Discovery happens through sitemaps, llms.txt, and inbound links. Fetching is done by named bots (GPTBot, ClaudeBot, PerplexityBot) plus a wider pool of retrieval-augmented crawlers. Parsing extracts JSON-LD, text, and structured blocks. Embedding stores vectorised chunks ready for semantic retrieval at query time.

A store can rank well in Google and still be absent from AI indexes — the two use different pipelines. The practical fix is to publish an llms.txt, emit clean JSON-LD on every page, and serve a products.ndjson feed so retrievers can ingest the catalogue without scraping.

Related terms

All terms

Core GEO

llms.txt

A proposed convention at /llms.txt that declares which pages of a site AI assistants should prioritise.

Core GEO

NDJSON Product Feed

A newline-delimited JSON file listing every product with full attributes — the canonical ingest format for AI retrievers.

AI Crawlers

GPTBot

OpenAI's web crawler that fetches pages for ChatGPT's retrieval and training data.

Shopify Signals

Semantic Chunking

The process a retrieval system uses to break a page into passages for embedding and retrieval.