AI Indexing Checklist for Shopify

Section 1 — Crawler access (9 items)

If the crawler can't reach the page, nothing else on the checklist matters. Start here.

The first nine items verify that every major AI crawler can fetch your store, receive a 200 response, and see the full rendered HTML. Skipping this category is the #1 cause of 'I did everything right and ChatGPT still ignores us' — half the time the crawler never reached the page.

GPTBot returns 200 on / within 500 ms.
ChatGPT-User returns 200 on a product page.
ClaudeBot returns 200 on / within 500 ms.
Claude-User returns 200 on a product page.
PerplexityBot returns 200 on /collections/{handle}.
Google-Extended returns 200 on /pages/about.
Meta-ExternalAgent returns 200 on /.
robots.txt explicitly allows all six bots.
No Cloudflare AI-scrapers rule is blocking them.

Section 2 — Schema.org coverage (11 items)

The structured data surface retrievers actually parse.

Organization schema on home page with @id + sameAs links.
WebSite schema with potentialAction = SearchAction.
Product schema on every product page (including variant SKU-level offers).
AggregateRating present on products with reviews.
FAQPage schema on every product page (≥3 Q&A pairs).
BreadcrumbList on every non-home page.
Article or BlogPosting on every blog post.
DefinedTerm on every glossary term.
HowTo on every step-by-step guide.
LocalBusiness on /pages/stores if brick-and-mortar.
All JSON-LD validates in Google's Rich Results Test.

Section 3 — Answer-block quality (8 items)

Retrievers cite self-contained paragraphs. These 8 checks tighten your copy.

Every product page has a 'Who is this for?' block (≤60 words).
Every product page has a 'How is this different from X?' block.
Every product page has a 'What's the catch?' / honest limits block.
Every collection page has a 'How to choose' paragraph.
Every FAQ answer is self-contained (no 'see above' / 'as mentioned').
Every answer is 40-80 words — longer answers get truncated, shorter ones get skipped.
Product names in body copy always include the brand prefix.
No marketing jargon without a definition nearby (retrievers penalise it).

Section 4 — Entity clarity (6 items)

Retrievers build an internal graph of brands. These checks stop you getting conflated with competitors.

Organization schema has @id anchored to the home page URL.
sameAs links point to LinkedIn, Crunchbase, Instagram, YouTube (or equivalents).
Founder Person schema present and linked via Organization.founder.
About page includes a 'What is {brand}?' quotable block.
Press mentions / reviews are linked with rel='noopener' (still discoverable).
No other business operates under the same trading name in your country.

Section 5 — AI-native feeds (6 items)

llms.txt + the NDJSON product feed are the two highest-leverage items on the entire checklist.

llms.txt present at /llms.txt and /.well-known/llms.txt.
llms-full.txt present and < 10 MB.
products.ndjson present with one product per line.
ai-sitemap.xml present with AI-priority annotations.
All four files regenerate within 30 seconds of a product publish.
llms.txt matches one of the seven Cookbook recipes (or a documented hybrid).

Section 6 — Measurement (7 items)

Close the loop: every change you make should be attributable.

Share of AI Voice tracked weekly across ≥20 buyer-intent prompts.
ChatGPT Shopping citation count tracked separately.
Perplexity citation count tracked separately.
Claude citation count tracked separately.
Referral traffic from ai.chatgpt.com, perplexity.ai, and claude.ai tagged in analytics.
AI-readiness score baseline recorded.
A weekly 'GEO review' calendar event exists.