AI Indexing is a four-stage pipeline. Discovery happens through sitemaps, llms.txt, and inbound links. Fetching is done by named bots (GPTBot, ClaudeBot, PerplexityBot) plus a wider pool of retrieval-augmented crawlers. Parsing extracts JSON-LD, text, and structured blocks. Embedding stores vectorised chunks ready for semantic retrieval at query time.
A store can rank well in Google and still be absent from AI indexes — the two use different pipelines. The practical fix is to publish an llms.txt, emit clean JSON-LD on every page, and serve a products.ndjson feed so retrievers can ingest the catalogue without scraping.