You rank on Google. Your reviews are 4.7. Your PDPs convert. You type your buying question into ChatGPT, it gives you three stores, and none of them are yours. This post is the diagnostic we run when a merchant sends us that screenshot.

Three categories, in this exact order
Stores fail at three layers: crawl, signal, narrative. The order isn't decorative. Crawl failures silence every other signal you publish — if GPTBot can't reach the page, your perfect JSON-LD and your witty PDP lead never make it into the retrieval index. The merchants who fix this fastest are the ones who stop triaging by what feels fun to work on and start triaging by which layer is actually blocking them.
Layer 1 — Crawl (five checks)
1. llms.txt at the root
A missing /llms.txt is the single most common miss we find. The file is effectively a robots.txt for assistants — a curated list of canonical URLs and section headings a model should reach for when answering questions about your domain. It's 3 KB of work and it typically earns citations within a week.
2. GPTBot allowed in robots.txt
Many stores added User-agent: GPTBot · Disallow: / during the 2023 scraping panic and then forgot. Two and a half years later, the same robots.txt still blocks the crawler that most of your shoppers are now asking. Check it. If it's blocked, delete the block, redeploy, and wait 72 hours.
3. Sitemap complete
Your sitemap.xml is the anchor every assistant retrievers look at when the model cites you. Missing sitemaps, sitemaps with 404s, or sitemaps that stop at 500 URLs (Shopify's default paginates) are common sources of partial indexation. Submit sitemap_index.xml that references every page you care about.
4. Canonical tags
Duplicate PDPs from ?variant= query strings confuse assistants as much as they confuse Google. A valid rel="canonical" on every PDP that points to the clean URL is a free fix and it prevents the retriever from choosing a low-authority duplicate over your real product page.
5. HTML2 renderable pass
Assistants use a lighter-weight renderer than Chromium when they crawl. Content that only appears after a client-side JS hydrate — a common Shopify pattern for review widgets, size guides, and sustainability blocks — may not make it into the retrieval snapshot. If your reviews are a Judge.me widget with no SSR fallback, the 184 reviews you're proud of are invisible to the model.
Layer 2 — Signal (five checks)
6. Product + Offer JSON-LD
The baseline. If your PDPs don't emit a valid Product with a nested Offer and a price, assistants down-weight the page in their ranker. The full fix is: name, description, sku, gtin, brand, offers (price, priceCurrency, availability, url), aggregateRating, review.
7. FAQPage schema
FAQPage is the single most-lifted schema type in assistant answers, because each Question + Answer pair is already sentence-shaped. Five questions on every collection page and every bestseller PDP — ideally ones a real shopper would ask — moves citation share fast.
8. Review density
Models don't trust stores with 6 reviews and a 5.0 average. They quietly prefer stores with 150+ reviews in the 4.4–4.8 range. If you're below 50 reviews on a product you want cited, that's worth a dedicated email campaign before you tune anything else.
9. Freshness date
A dateModified older than 180 days signals staleness to retrievers who have recency heuristics. If you haven't touched the page in a year, the PDP copy is probably also stale. Fix it and update the field.
10. Brand name matches domain
schema:Organization with name = your storefront brand AND url = your domain. Stores whose brand renders as "Store 72" but whose domain is 72desks.com confuse the entity-resolution pass. Assistants can't confidently name a store they can't uniquely identify.
Layer 3 — Narrative (four checks)
11. PDP lead under 55 words
The first paragraph of your PDP is the one the model reads. If it buries the differentiator under paragraphs of brand voice, the retriever won't promote it. Rewrite so the first 50–55 words state: what the product is, who it's for, and the one specific fact you want quoted.
12. Honest comparison page
Stores that publish a yourbrand-vs-competitor page with real trade-offs (not a puff piece) get cited on comparison prompts disproportionately. Assistants reward the source that sounds like it's actually weighing options, not the one that sounds like marketing.
13. Reddit presence
If your brand name doesn't show up in r/ threads your shoppers read, the retriever has no external corroboration and defaults to the brand that does. You don't need to astroturf — engaging honestly in two relevant subs a week for a quarter usually shows up in the training-data and retrieval mix.
14. Category-level llms hint
A single llms.txt at the root is the baseline; a second one at each category path (e.g. /collections/desks/llms.txt) that names your canonical PDPs for that category sharpens the retriever's ability to disambiguate when the shopper asks a category question.

What moves first
We've run this diagnostic on 1,200+ stores. The distribution is boring: the same six misses explain 9 out of 10 absences. If you do only these six, in this order, the needle moves inside a quarter.
- Ship /llms.txt at the site root — 3KB, lists your 20 canonical URLs. Do this today.
- Delete any Disallow for GPTBot, ClaudeBot, PerplexityBot from robots.txt. Redeploy, wait 72 hours.
- Add FAQPage JSON-LD to every collection and bestseller PDP — five honest questions minimum.
- Rewrite the first paragraph of your top 20 PDPs to 50-55 words, facts-first.
- Publish at least one honest comparison page against your strongest competitor.
- Align schema:Organization name with your storefront brand + domain. Redeploy.
How long until you're in the citation set?
Shortest path: 6 days (llms.txt + GPTBot unblock + PDP lead rewrite for one product, running against a prompt set that's specific enough to your SKU). Typical: 4–6 weeks to reach a meaningful share of voice. The compounding effect matters — each fix is small, but the retriever behaviour is non-linear. A store on 4 of the 14 checks is functionally invisible. A store on 10 of 14 gets picked regularly.
One last calibration
Being absent from ChatGPT is not a brand problem. It's an infrastructure problem presenting as a brand problem. Stores with strong brands get cited as easily as stores with weak brands — assuming both have done the 14 things above. The stores that are absent are almost always absent because nobody on the team has owned this work, not because the product or the marketing is wrong.