Skip to main content
Field NotesGEO Playbook11 min read

Why your Shopify store isn't in ChatGPT

Nine times out of ten a Shopify store is absent from ChatGPT for one of three classes of reason: it isn't being crawled by the right bots, its signals don't survive the retrieval pass, or the narrative on the page isn't shaped like something a model will quote. Here is the 14-point check we run, the thresholds we use, and the 6 fixes that close the gap fastest.

Harry Parker
Co-founder, Onviqa Inc. · Surfient
chatgpt-absent-diagnostic.svg
TL;DR
  • Absent-from-ChatGPT stores fail on three categories, in this order: crawl access, schema signal, narrative shape. Close the crawl misses first or nothing else matters.
  • Six misses cover 90% of cases: no llms.txt, GPTBot blocked, no FAQPage schema, buried PDP lead, no honest comparison page, mismatched brand/domain name.
  • Measured on our panel, closing those six moves citation share from 0% to an average of 23% across four assistants in six weeks.

You rank on Google. Your reviews are 4.7. Your PDPs convert. You type your buying question into ChatGPT, it gives you three stores, and none of them are yours. This post is the diagnostic we run when a merchant sends us that screenshot.

A 14-point diagnostic grid grouped into Crawl, Signal and Narrative columns. Each rung shows a colour-coded MISS, WARN or OK status, with a short reason. Aggregate score at the bottom shows 6 misses, 4 at-risk, 4 passing.
Figure 1 — The 14-point diagnostic grouped by category (Crawl, Signal, Narrative). The first pass on a typical absent-from-ChatGPT store surfaces 6 misses, 4 at-risk, 4 passing.

Three categories, in this exact order

Stores fail at three layers: crawl, signal, narrative. The order isn't decorative. Crawl failures silence every other signal you publish — if GPTBot can't reach the page, your perfect JSON-LD and your witty PDP lead never make it into the retrieval index. The merchants who fix this fastest are the ones who stop triaging by what feels fun to work on and start triaging by which layer is actually blocking them.

Layer 1 — Crawl (five checks)

1. llms.txt at the root

A missing /llms.txt is the single most common miss we find. The file is effectively a robots.txt for assistants — a curated list of canonical URLs and section headings a model should reach for when answering questions about your domain. It's 3 KB of work and it typically earns citations within a week.

2. GPTBot allowed in robots.txt

Many stores added User-agent: GPTBot · Disallow: / during the 2023 scraping panic and then forgot. Two and a half years later, the same robots.txt still blocks the crawler that most of your shoppers are now asking. Check it. If it's blocked, delete the block, redeploy, and wait 72 hours.

3. Sitemap complete

Your sitemap.xml is the anchor every assistant retrievers look at when the model cites you. Missing sitemaps, sitemaps with 404s, or sitemaps that stop at 500 URLs (Shopify's default paginates) are common sources of partial indexation. Submit sitemap_index.xml that references every page you care about.

4. Canonical tags

Duplicate PDPs from ?variant= query strings confuse assistants as much as they confuse Google. A valid rel="canonical" on every PDP that points to the clean URL is a free fix and it prevents the retriever from choosing a low-authority duplicate over your real product page.

5. HTML2 renderable pass

Assistants use a lighter-weight renderer than Chromium when they crawl. Content that only appears after a client-side JS hydrate — a common Shopify pattern for review widgets, size guides, and sustainability blocks — may not make it into the retrieval snapshot. If your reviews are a Judge.me widget with no SSR fallback, the 184 reviews you're proud of are invisible to the model.

Layer 2 — Signal (five checks)

6. Product + Offer JSON-LD

The baseline. If your PDPs don't emit a valid Product with a nested Offer and a price, assistants down-weight the page in their ranker. The full fix is: name, description, sku, gtin, brand, offers (price, priceCurrency, availability, url), aggregateRating, review.

7. FAQPage schema

FAQPage is the single most-lifted schema type in assistant answers, because each Question + Answer pair is already sentence-shaped. Five questions on every collection page and every bestseller PDP — ideally ones a real shopper would ask — moves citation share fast.

8. Review density

Models don't trust stores with 6 reviews and a 5.0 average. They quietly prefer stores with 150+ reviews in the 4.4–4.8 range. If you're below 50 reviews on a product you want cited, that's worth a dedicated email campaign before you tune anything else.

9. Freshness date

A dateModified older than 180 days signals staleness to retrievers who have recency heuristics. If you haven't touched the page in a year, the PDP copy is probably also stale. Fix it and update the field.

10. Brand name matches domain

schema:Organization with name = your storefront brand AND url = your domain. Stores whose brand renders as "Store 72" but whose domain is 72desks.com confuse the entity-resolution pass. Assistants can't confidently name a store they can't uniquely identify.

Layer 3 — Narrative (four checks)

11. PDP lead under 55 words

The first paragraph of your PDP is the one the model reads. If it buries the differentiator under paragraphs of brand voice, the retriever won't promote it. Rewrite so the first 50–55 words state: what the product is, who it's for, and the one specific fact you want quoted.

12. Honest comparison page

Stores that publish a yourbrand-vs-competitor page with real trade-offs (not a puff piece) get cited on comparison prompts disproportionately. Assistants reward the source that sounds like it's actually weighing options, not the one that sounds like marketing.

13. Reddit presence

If your brand name doesn't show up in r/ threads your shoppers read, the retriever has no external corroboration and defaults to the brand that does. You don't need to astroturf — engaging honestly in two relevant subs a week for a quarter usually shows up in the training-data and retrieval mix.

14. Category-level llms hint

A single llms.txt at the root is the baseline; a second one at each category path (e.g. /collections/desks/llms.txt) that names your canonical PDPs for that category sharpens the retriever's ability to disambiguate when the shopper asks a category question.

Two horizontal bar charts — before and after six weeks. Before chart shows 0 percent citation share on all four assistants. After chart shows ChatGPT 28 percent, Perplexity 34 percent, AI Overviews 11 percent, Claude 19 percent. Six weekly chips in a timeline beneath the charts itemise the fixes: llms.txt shipped, GPTBot allowed, FAQPage added, PDP lead rewrite, compare pages live, brand-domain alignment.
Figure 2 — Citation share before and after closing the six misses. Six weeks, same store, no new paid spend. ChatGPT 0% → 28%, Perplexity 0% → 34%, AI Overviews 0% → 11%, Claude 0% → 19%.

What moves first

We've run this diagnostic on 1,200+ stores. The distribution is boring: the same six misses explain 9 out of 10 absences. If you do only these six, in this order, the needle moves inside a quarter.

  • Ship /llms.txt at the site root — 3KB, lists your 20 canonical URLs. Do this today.
  • Delete any Disallow for GPTBot, ClaudeBot, PerplexityBot from robots.txt. Redeploy, wait 72 hours.
  • Add FAQPage JSON-LD to every collection and bestseller PDP — five honest questions minimum.
  • Rewrite the first paragraph of your top 20 PDPs to 50-55 words, facts-first.
  • Publish at least one honest comparison page against your strongest competitor.
  • Align schema:Organization name with your storefront brand + domain. Redeploy.

How long until you're in the citation set?

Shortest path: 6 days (llms.txt + GPTBot unblock + PDP lead rewrite for one product, running against a prompt set that's specific enough to your SKU). Typical: 4–6 weeks to reach a meaningful share of voice. The compounding effect matters — each fix is small, but the retriever behaviour is non-linear. A store on 4 of the 14 checks is functionally invisible. A store on 10 of 14 gets picked regularly.

One last calibration

Being absent from ChatGPT is not a brand problem. It's an infrastructure problem presenting as a brand problem. Stores with strong brands get cited as easily as stores with weak brands — assuming both have done the 14 things above. The stores that are absent are almost always absent because nobody on the team has owned this work, not because the product or the marketing is wrong.

Tags:ChatGPTShopifyGEODiagnosticllms.txt

Frequently asked questions

Try Surfient free

See how your Shopify store scores with AI engines

Surfient audits every signal ChatGPT, Perplexity, Claude, and Google AI Overviews read on your store — in under 60 seconds, with no install, no card, no catch.

  • ChatGPT, Perplexity, Claude, and AI Overviews
  • Store-by-store score with fix priorities
  • 60-second audit, no install or card
Harry Parker
Co-founder, Onviqa Inc. · Surfient

Harry has led SEO and e-commerce engineering for over 12 years and has been shipping Shopify software since Onviqa was founded in 2014. He writes about where commerce is headed when shoppers stop typing queries and start asking assistants.

Related reading

All posts