Skip to main content
AI GuidesCommercial + tool comparisons

The Shopify AI SEO / GEO checklist for 2026

One page, one checklist, in the order the signals actually move. Work through this top to bottom and a Shopify store typically goes from invisible inside AI engines to consistently cited in 60 to 90 days.

Hiren Bhuva with Harry Parker

Co-founder, Onviqa Inc.

13 min
neural-grid.svg
The Shopify AI SEO / GEO checklist for 2026

How do you actually use this checklist?

48 items across six layers. Mark each done / partial / missing. Aim for 70% done plus every crawler-access item green before you judge results.

This is not a 120-item dump. It is the shortest list that covers every signal an AI engine actually looks at when it decides whether to quote a Shopify store. Work through it top to bottom — the early layers compound the later ones, not the other way round. Resist the urge to jump to schema before the feed is clean, or to blog posts before product pages have answers in them.

Total items
48
Minimum viable
34 items
Typical time to first citations
21-45 days
Typical time to steady-state
60-90 days
timeline-30d.svgInfographic
Default 30-day rollout we use with Shopify merchants — audit, ship, measure, iterate.30-DAY ROLLOUTWEEK 1Audit & baselinellms.txt · NDJSON · JSON-LDWEEK 2Ship structureMetafields · FAQPage · schemaWEEK 3Engines + measurementPer-engine tracking liveWEEK 4Iterate to first citationsRefine low-signal pagesDEFAULT ROLLOUT · APPROX. 30 CALENDAR DAYS
Figure · timeline 30dDefault 30-day rollout we use with Shopify merchants — audit, ship, measure, iterate.

1. Feed layer — is your catalogue machine-readable?

AI engines do not crawl storefronts one URL at a time like 2015 Googlebot. They pull structured feeds and decide what to cite from there.

Get this layer wrong and everything downstream is wasted. A healthy feed is the cheapest single lift in the entire checklist — usually a weekend of focused work — and it is the signal almost every AI engine reads first when deciding whether your catalogue is worth indexing at all.

  1. 1Google Merchant Center feed is connected, healthy, and updated at least daily (more than 95% of items approved).
  2. 2All 42+ required attributes populated per item: gtin / mpn / brand / gender / age_group / size / colour / material / condition / availability / price / sale_price / item_group_id.
  3. 3Shopify Markets or equivalent is sending localised feeds per target country (price + currency + availability per country).
  4. 4Product images are on a public CDN with stable URLs (no signed, expiring, or bot-blocked URLs).
  5. 5A public products.ndjson or JSON-LD ItemList feed is exposed at a discoverable URL and linked from llms.txt.
  6. 6Out-of-stock items report availability=out_of_stock rather than being silently removed — AI engines remember removed SKUs as 'discontinued'.
  7. 7item_group_id correctly groups variants so engines cite 'the Aero running shoe in black, size 10' not 'a shoe'.

2. Schema layer — does every page earn a structured answer?

Schema tells AI engines what a page means, not just what it says. Without Product schema, a product page is a PDF to a crawler.

A product without Product schema is a PDF to a crawler; a product with complete Product schema is a candidate for citation. The delta between the two is usually an afternoon of work and a tenfold jump in citation eligibility.

  1. 1Product JSON-LD on every product page with name / description / sku / gtin13 / brand / image (array of three or more) / offers (price, priceCurrency, availability, priceValidUntil).
  2. 2AggregateRating and Review JSON-LD on product pages that have real reviews (minimum 5 — do not emit for pages with fewer).
  3. 3BreadcrumbList JSON-LD on every collection, product, and content page.
  4. 4Organization JSON-LD on the homepage with logo / sameAs links to socials / contactPoint.
  5. 5FAQPage JSON-LD on product and collection pages where you have genuine Q&A content (not keyword-stuffing).
  6. 6HowTo JSON-LD on every content page that teaches a process (setup guides, care instructions, usage tutorials).
  7. 7Article JSON-LD on every blog post with author.Person schema linking to a real author page with a real bio.
  8. 8All schema validates in Rich Results Test and Schema.org validator with zero errors.
A merchant had 1,400 products. Six had complete Product schema, 1,394 had partial. The six appeared in ChatGPT answers. The 1,394 did not. Coincidence? We repeated the test across 11 merchants. Same pattern every time.
Surfient audit notes, Q1 2026

3. Content layer — are your answers worth quoting?

AI engines do not cite pages. They cite sentences. Each page needs at least one sentence complete enough to be worth pulling out.

The practical test for any page on the site: if an AI engine pulled one sentence from it verbatim, would that sentence actually answer a question a shopper asked? If not, the page is decorative to a retrieval system.

  1. 1Every product page has a 'Who is this for / not for' paragraph in the first 200 words.
  2. 2Every product page has specifications as both prose and a table or keyvalue block.
  3. 3Every product page has a genuine comparison section ('vs alternative X: we are better at A, they are better at B').
  4. 4Every collection page opens with a 60-120 word intro explaining the use-case the collection solves.
  5. 5Every blog post has a TL;DR of exactly three bullets at the top.
  6. 6Every blog post ends with a 'sources and further reading' block citing at least two external authorities.
  7. 7Long-tail conversational keywords are integrated naturally — 'best running shoe for overpronation and narrow feet' rather than 'running shoes narrow feet'.
  8. 8Author.Person pages exist for every writer with bio, credentials, and at least two external proof links (LinkedIn, published work, credentials).

4. Authority layer — why would an AI engine trust you over the next brand?

AI engines rank on trust signals far more heavily than 2020 Google did. Thin authority means crawled, indexed, and never cited.

Authority is the slowest signal to build but the stickiest one you own. Feed and schema can be fixed in a weekend; authority takes months. Which is exactly why it has to start on day one even when you cannot measure it yet.

  1. 1At least three external publications link to the store with real editorial context (not paid link-ops).
  2. 2Reviews on at least two independent platforms (Trustpilot, Google Business, Judge.me) with more than 30 reviews each and an average of 4.4+.
  3. 3A real About page with founder bio, year founded, physical address (if applicable), and team photos.
  4. 4Contact page with phone, email, response-time commitment, and an actual returns or warranty policy linked.
  5. 5Shipping, returns, and privacy policies are written in plain English — not legal boilerplate — and each is a standalone page, not an accordion.
  6. 6Brand is mentioned on at least five non-paid third-party pages that AI engines can crawl (forum threads, YouTube descriptions, newsletter archives).
  7. 7At least one podcast appearance, interview, or published byline per quarter from a named team member.

5. Crawler access layer — can AI bots actually read you?

Half the audits we run on stalled stores are fixed by two lines in robots.txt. Everything above this layer is wasted if the bots are blocked.

Shopify's default firewall posture and Cloudflare's default bot-fighting rules both block AI crawlers more often than merchants realise. This is the one layer where doing nothing is the worst option — silence reads as 'blocked' to most retrievers.

  1. 1robots.txt explicitly allows GPTBot, PerplexityBot, ClaudeBot, Google-Extended, CCBot, and Applebot-Extended (or explicitly blocks them if that is your policy — silence is the worst option).
  2. 2Cloudflare WAF or equivalent is not blocking AI user-agents by default (check Bot Fight Mode settings).
  3. 3Shopify's built-in bot protection is not over-rate-limiting crawlers (check 429 responses in logs).
  4. 4llms.txt is published at the root and linked from the homepage footer, pointing to products.ndjson, sitemap, and top 20 content URLs.
  5. 5llms-full.txt contains a curated single-document bundle of the most citable content (about, top products, policies, FAQs).
  6. 6ai-sitemap.xml is published with lastmod timestamps and submitted nowhere — AI engines discover it from llms.txt.
  7. 7All canonical URLs resolve with 200 OK for AI user-agents (test with curl -A 'GPTBot'). No silent 403s.
  8. 8JavaScript-rendered content has a server-rendered fallback — most AI crawlers do not execute JS.
# Check that each AI crawler can read your homepage
for ua in 'GPTBot/1.0' 'PerplexityBot/1.0' 'ClaudeBot/1.0' 'Google-Extended' 'Applebot-Extended'; do
  echo "\n=== $ua ==="
  curl -s -o /dev/null -w '%{http_code}\n' -A "$ua" https://your-store.com/
done

6. Measurement layer — do you know what is actually moving?

AI-search measurement in 2026 is not GA4 — it is edge logs, weekly prompt panels, and per-engine share-of-voice tracked separately.

You cannot improve what you do not measure, and GA4 alone will underreport AI-referred traffic by a factor of two to five because AI referrers strip most tracking parameters. Build the measurement stack once, then let it run quietly.

  1. 1Referrer logs are captured at the edge (Cloudflare logs, Shopify server logs) and parsed daily for chatgpt.com, perplexity.ai, claude.ai, gemini.google.com, copilot.microsoft.com, you.com, grok.x.ai referrers.
  2. 2A weekly 'brand mention' scan runs across at least three AI engines with a fixed prompt battery (e.g. 'best X for Y').
  3. 3Share-of-voice is tracked per engine per category — not just 'are we mentioned' but 'what percent of relevant answers cite us'.
  4. 4Conversion path from AI referrer to order is tagged in analytics with UTMs auto-appended by edge workers.
  5. 5GEO Score (or equivalent health dashboard) is reviewed weekly, not monthly.

3.2x

Median conversion-rate lift from AI-referred traffic vs organic Google traffic

Surfient 2026 cohort — smaller volume, far higher intent.

7. What does the honest 60-day plan look like?

If you cannot ship everything, ship in this order. This is the sequence the best-performing merchants in our cohort actually followed.

The temptation is to batch everything for a 'big reveal'. Do not. AI engines re-crawl weekly or faster — every improvement compounds in days, not quarters. Shipping layer by layer beats shipping a complete audit three months late.

  1. 1Week 1: Fix crawler access (robots.txt, WAF, llms.txt). Run bot-user-agent curl tests.
  2. 2Week 2: Clean the feed. Fix any attribute under 95% populated. Reconnect Merchant Center if disapproved.
  3. 3Weeks 3-4: Ship Product + Breadcrumb + Organization schema site-wide. Validate zero errors.
  4. 4Weeks 5-6: Rewrite the top 20 product pages for citability (specs + 'who it is for' + comparison block).
  5. 5Weeks 7-8: Publish 4-6 genuinely useful content pages answering specific conversational queries with HowTo schema.
  6. 6Week 9 onwards: Authority work — podcast pitches, author bio pages, review-platform presence — runs forever in the background.

Frequently asked questions

7

Pulled from the questions merchants ask us most often in advisory calls. Crawlers see these as FAQPage schema — the answers here match what appears in AI citations.

  • No. Our cohort data shows steady citations start at about 34 items done well — roughly 70%. The remaining 14 compound returns but are rarely the deciding factor. Start shipping; do not wait for a perfect scorecard.

Free · 5 minutes · no signup

Ready to see your store's GEO score?

Run a free Surfient audit and see exactly what ChatGPT, Perplexity, Claude, Gemini, and Google AI Overviews are missing about your store — signal family by signal family.

0

GEO score

Engine readiness

0

Technical indexing

0

Content fit

0

Live example — your number is ready in about 90 seconds.

Keep reading

Browse all AI Guides