How do you actually use this checklist?
48 items across six layers. Mark each done / partial / missing. Aim for 70% done plus every crawler-access item green before you judge results.
This is not a 120-item dump. It is the shortest list that covers every signal an AI engine actually looks at when it decides whether to quote a Shopify store. Work through it top to bottom — the early layers compound the later ones, not the other way round. Resist the urge to jump to schema before the feed is clean, or to blog posts before product pages have answers in them.
- Total items
- 48
- Minimum viable
- 34 items
- Typical time to first citations
- 21-45 days
- Typical time to steady-state
- 60-90 days
1. Feed layer — is your catalogue machine-readable?
AI engines do not crawl storefronts one URL at a time like 2015 Googlebot. They pull structured feeds and decide what to cite from there.
Get this layer wrong and everything downstream is wasted. A healthy feed is the cheapest single lift in the entire checklist — usually a weekend of focused work — and it is the signal almost every AI engine reads first when deciding whether your catalogue is worth indexing at all.
- 1Google Merchant Center feed is connected, healthy, and updated at least daily (more than 95% of items approved).
- 2All 42+ required attributes populated per item: gtin / mpn / brand / gender / age_group / size / colour / material / condition / availability / price / sale_price / item_group_id.
- 3Shopify Markets or equivalent is sending localised feeds per target country (price + currency + availability per country).
- 4Product images are on a public CDN with stable URLs (no signed, expiring, or bot-blocked URLs).
- 5A public products.ndjson or JSON-LD ItemList feed is exposed at a discoverable URL and linked from llms.txt.
- 6Out-of-stock items report availability=out_of_stock rather than being silently removed — AI engines remember removed SKUs as 'discontinued'.
- 7item_group_id correctly groups variants so engines cite 'the Aero running shoe in black, size 10' not 'a shoe'.
2. Schema layer — does every page earn a structured answer?
Schema tells AI engines what a page means, not just what it says. Without Product schema, a product page is a PDF to a crawler.
A product without Product schema is a PDF to a crawler; a product with complete Product schema is a candidate for citation. The delta between the two is usually an afternoon of work and a tenfold jump in citation eligibility.
- 1Product JSON-LD on every product page with name / description / sku / gtin13 / brand / image (array of three or more) / offers (price, priceCurrency, availability, priceValidUntil).
- 2AggregateRating and Review JSON-LD on product pages that have real reviews (minimum 5 — do not emit for pages with fewer).
- 3BreadcrumbList JSON-LD on every collection, product, and content page.
- 4Organization JSON-LD on the homepage with logo / sameAs links to socials / contactPoint.
- 5FAQPage JSON-LD on product and collection pages where you have genuine Q&A content (not keyword-stuffing).
- 6HowTo JSON-LD on every content page that teaches a process (setup guides, care instructions, usage tutorials).
- 7Article JSON-LD on every blog post with author.Person schema linking to a real author page with a real bio.
- 8All schema validates in Rich Results Test and Schema.org validator with zero errors.
“A merchant had 1,400 products. Six had complete Product schema, 1,394 had partial. The six appeared in ChatGPT answers. The 1,394 did not. Coincidence? We repeated the test across 11 merchants. Same pattern every time.”
3. Content layer — are your answers worth quoting?
AI engines do not cite pages. They cite sentences. Each page needs at least one sentence complete enough to be worth pulling out.
The practical test for any page on the site: if an AI engine pulled one sentence from it verbatim, would that sentence actually answer a question a shopper asked? If not, the page is decorative to a retrieval system.
- 1Every product page has a 'Who is this for / not for' paragraph in the first 200 words.
- 2Every product page has specifications as both prose and a table or keyvalue block.
- 3Every product page has a genuine comparison section ('vs alternative X: we are better at A, they are better at B').
- 4Every collection page opens with a 60-120 word intro explaining the use-case the collection solves.
- 5Every blog post has a TL;DR of exactly three bullets at the top.
- 6Every blog post ends with a 'sources and further reading' block citing at least two external authorities.
- 7Long-tail conversational keywords are integrated naturally — 'best running shoe for overpronation and narrow feet' rather than 'running shoes narrow feet'.
- 8Author.Person pages exist for every writer with bio, credentials, and at least two external proof links (LinkedIn, published work, credentials).
5. Crawler access layer — can AI bots actually read you?
Half the audits we run on stalled stores are fixed by two lines in robots.txt. Everything above this layer is wasted if the bots are blocked.
Shopify's default firewall posture and Cloudflare's default bot-fighting rules both block AI crawlers more often than merchants realise. This is the one layer where doing nothing is the worst option — silence reads as 'blocked' to most retrievers.
- 1robots.txt explicitly allows GPTBot, PerplexityBot, ClaudeBot, Google-Extended, CCBot, and Applebot-Extended (or explicitly blocks them if that is your policy — silence is the worst option).
- 2Cloudflare WAF or equivalent is not blocking AI user-agents by default (check Bot Fight Mode settings).
- 3Shopify's built-in bot protection is not over-rate-limiting crawlers (check 429 responses in logs).
- 4llms.txt is published at the root and linked from the homepage footer, pointing to products.ndjson, sitemap, and top 20 content URLs.
- 5llms-full.txt contains a curated single-document bundle of the most citable content (about, top products, policies, FAQs).
- 6ai-sitemap.xml is published with lastmod timestamps and submitted nowhere — AI engines discover it from llms.txt.
- 7All canonical URLs resolve with 200 OK for AI user-agents (test with curl -A 'GPTBot'). No silent 403s.
- 8JavaScript-rendered content has a server-rendered fallback — most AI crawlers do not execute JS.
# Check that each AI crawler can read your homepage
for ua in 'GPTBot/1.0' 'PerplexityBot/1.0' 'ClaudeBot/1.0' 'Google-Extended' 'Applebot-Extended'; do
echo "\n=== $ua ==="
curl -s -o /dev/null -w '%{http_code}\n' -A "$ua" https://your-store.com/
done6. Measurement layer — do you know what is actually moving?
AI-search measurement in 2026 is not GA4 — it is edge logs, weekly prompt panels, and per-engine share-of-voice tracked separately.
You cannot improve what you do not measure, and GA4 alone will underreport AI-referred traffic by a factor of two to five because AI referrers strip most tracking parameters. Build the measurement stack once, then let it run quietly.
- 1Referrer logs are captured at the edge (Cloudflare logs, Shopify server logs) and parsed daily for chatgpt.com, perplexity.ai, claude.ai, gemini.google.com, copilot.microsoft.com, you.com, grok.x.ai referrers.
- 2A weekly 'brand mention' scan runs across at least three AI engines with a fixed prompt battery (e.g. 'best X for Y').
- 3Share-of-voice is tracked per engine per category — not just 'are we mentioned' but 'what percent of relevant answers cite us'.
- 4Conversion path from AI referrer to order is tagged in analytics with UTMs auto-appended by edge workers.
- 5GEO Score (or equivalent health dashboard) is reviewed weekly, not monthly.
3.2x
Median conversion-rate lift from AI-referred traffic vs organic Google traffic
Surfient 2026 cohort — smaller volume, far higher intent.
7. What does the honest 60-day plan look like?
If you cannot ship everything, ship in this order. This is the sequence the best-performing merchants in our cohort actually followed.
The temptation is to batch everything for a 'big reveal'. Do not. AI engines re-crawl weekly or faster — every improvement compounds in days, not quarters. Shipping layer by layer beats shipping a complete audit three months late.
- 1Week 1: Fix crawler access (robots.txt, WAF, llms.txt). Run bot-user-agent curl tests.
- 2Week 2: Clean the feed. Fix any attribute under 95% populated. Reconnect Merchant Center if disapproved.
- 3Weeks 3-4: Ship Product + Breadcrumb + Organization schema site-wide. Validate zero errors.
- 4Weeks 5-6: Rewrite the top 20 product pages for citability (specs + 'who it is for' + comparison block).
- 5Weeks 7-8: Publish 4-6 genuinely useful content pages answering specific conversational queries with HowTo schema.
- 6Week 9 onwards: Authority work — podcast pitches, author bio pages, review-platform presence — runs forever in the background.