Does ChatGPT read my Google Merchant Center feed?

Not directly. ChatGPT draws from Shopify's Agentic Commerce Protocol (for transactional prompts) and Bing editorial index (for informational prompts). Your Google Merchant Center feed flows into Gemini and Google AI Overviews, not ChatGPT. The two feeds should be kept at parity because your competitors across engines will win on whichever feed is cleaner.

Do all engines read llms.txt?

Most do, at varying weights. Perplexity and Claude explicitly document parsing llms.txt. OpenAI, Google, Microsoft, and xAI crawlers empirically fetch it when they discover a site but have not formally confirmed the weighting. The cost of shipping the file is low enough that every store should ship a well-curated version regardless of the weighting uncertainty.

How do I know which engine is citing which source for my brand?

Most engines surface the citation source in the UI — Perplexity, You.com, Copilot, Grok all show explicit source chips. ChatGPT sometimes shows citations, sometimes does not. Gemini shows citations in AI Overviews but not in pure Gemini chat. Running a weekly prompt panel across all engines and logging the visible citations is the only reliable way to see the source mix your brand is actually earning.

Can I pay for a priority feed into any AI engine?

Not as of April 2026. None of the mainstream engines offer paid-priority ranking inside organic AI answers. ChatGPT Shopping is free to merchants. Google AI Overviews rank organic sources, with ads appearing separately. If a vendor offers 'paid placement inside ChatGPT answers', assume it is either misleading marketing or a partnership that does not generalise.

How often do these pipelines refresh?

Varies widely. ACP inventory refreshes every 5 minutes. Google Merchant Center refreshes on a 15-minute to 24-hour cadence depending on plan. Bing Shopping is 1-4 hours. Direct crawls are every 3-10 days for most pages, faster for frequently updated ones. Third-party signals compound over weeks or months — do not expect a Reddit post to change your Perplexity citation share overnight.

What is the single highest-leverage source family to fix first?

For most stores: the direct-control layer, specifically schema.org markup plus the three feeds (ACP, Google Merchant Center, Bing Shopping). Schema is read by every engine; the three feeds collectively cover five of the six mainstream engines. Fixing those two buckets typically closes 60-70% of an engine-visibility gap before any content work starts.

AI GuidesEngine-specific playbooks

Where AI engines get product data from

Every AI answer about a product has a provenance. Tracing that provenance is how merchants decide which data pipelines to prioritise. This is the map — nine sources, six engines, and the merchant actions that control each one.

Nora Kimura with Hiren Bhuva

AI Retrieval Researcher

11 minUpdated April 21, 2026

Run free audit Read the guide

neural-grid.svg

The three pipeline families every AI engine mixes

Licensed feeds, direct web crawl, third-party aggregators. Every engine uses all three — the mix determines where you need to focus.

There is no single source of truth inside an AI engine's retrieval stack. Every engine blends three families of data, and the relative weights of those families are what makes one engine optimise differently from another. Understanding the three families before you look at specific engines is the fastest way to build an accurate mental model.

Licensed feeds: Structured catalog data delivered through a formal partnership. Examples: Shopify's Agentic Commerce Protocol, Google Merchant Center, Bing Shopping feed, Amazon's catalog API. Tightly controlled, high fidelity.
Direct web crawl: Classic web crawl of your public pages. Reads HTML, schema.org markup, llms.txt, ai-sitemap.xml. Every engine runs its own crawler (GPTBot, ClaudeBot, PerplexityBot, Google-Extended, Bingbot, YouBot, xAI crawler).
Third-party aggregators: Content about your products published elsewhere. Reddit threads, Trustpilot reviews, Wikipedia entries, independent review sites, YouTube reviews, X posts. Used for corroboration and sentiment.

distinct product-data sources we track across the six mainstream AI engines

Surfient retrieval research panel, April 2026 — derived from 2,400 tracked citations across ChatGPT, Gemini, Perplexity, Claude, Copilot, You.com.

step-flow.svgInfographic

Figure · step flowThe four-step arc this guide walks through — each numbered card maps to a section below.

Engine-by-engine: where each one actually pulls from

The six mainstream engines have meaningfully different source mixes. Optimising only for ChatGPT's mix leaves Gemini and Claude partially unaddressed.

Every engine publicly discloses some fraction of its retrieval approach but none of them disclose the full mix. The engine-by-engine picture below is assembled from public documentation, OpenAI / Google / Anthropic engineering talks, and empirical attribution research across thousands of tracked citations.

ChatGPT (OpenAI)

Primary source: Agentic Commerce Protocol feed for transactional prompts.
Secondary source: Bing editorial index for informational and comparison prompts.
Tertiary source: Direct GPTBot crawl for long-form content and buyer guides.
Third-party weight: Medium — Reddit and Trustpilot referenced but not dominant.

Gemini and Google AI Overviews

Primary source: Google Merchant Center product feed.
Secondary source: Google web index — your organic rankings carry over into AI Overviews.
Tertiary source: Shopping tab results and Google Shopping graph entities.
Third-party weight: High — Google's Knowledge Graph pulls from Wikipedia, Wikidata, and licensed data partners extensively.

Perplexity

Primary source: Direct PerplexityBot crawl of your site — HTML, schema, llms.txt.
Secondary source: Perplexity's own web index, built from crawl + curated editorial sources.
Tertiary source: Shopify Catalog API integration for enrolled merchants.
Third-party weight: Very high — Reddit, Trustpilot, independent reviews, and YouTube are weighted heavily.

Claude (Anthropic)

Primary source: Direct ClaudeBot crawl of your public pages.
Secondary source: Anthropic's web-retrieval layer when the assistant is allowed web tools.
Tertiary source: No dedicated commerce feed integration as of April 2026.
Third-party weight: Medium — Claude reads Reddit and Trustpilot when encountered but does not weight them as aggressively as Perplexity.

Copilot (Microsoft)

Primary source: Bing Shopping feed and Bing editorial index.
Secondary source: Bingbot direct crawl of your storefront.
Tertiary source: Microsoft Shopping Graph for categorised product data.
Third-party weight: Medium-low — Reddit is read, creator content is referenced less often than on ChatGPT or Perplexity.

You.com

Primary source: Direct YouBot crawl with heavy weighting on passage-level extraction.
Secondary source: Curated editorial sources and fresh-news index.
Tertiary source: No dedicated commerce feed; relies on schema and HTML content.
Third-party weight: High — cross-source corroboration is central to the citation ranker.

What merchants control directly versus indirectly

Six of the nine source families are directly controllable. Three are only influenced. Knowing the difference changes how you invest time.

Merchant influence is not evenly distributed across the nine source families. Some you edit from Shopify Admin in minutes; others you shape over months through community and PR work. Ranking your investments against this taxonomy is the single best way to avoid wasting a quarter on the wrong levers.

Direct control (ship this week)

Agentic Commerce Protocol feed — driven by your Shopify product data. Auto-enrolled but only as good as your titles, GTINs, availability, and images.
Google Merchant Center feed — direct upload or auto-sync via the Google & YouTube app. Most Gemini and Google AI Overviews citations depend on this feed being clean.
Bing Shopping feed — upload in Microsoft Merchant Center or via Shopify's Bing app. Feeds ChatGPT's informational pathway and Copilot entirely.
On-site content — product descriptions, FAQ sections, blog posts. You write these, the crawlers read them.
Schema.org markup — Product, FAQPage, AggregateRating, BreadcrumbList. Pure technical work — ship once, benefit everywhere.
llms.txt and ai-sitemap.xml — curated signal files every engine reads to some extent.

Indirect influence (ship quarterly, not weekly)

Reddit / community threads — you cannot write these; you can only encourage authentic discussion by participating, answering questions, and shipping products that generate organic conversation.
Trustpilot / independent reviews — solicit reviews through post-purchase flows, respond to every review (positive and negative), maintain a complete profile.
Editorial and creator coverage — traditional PR and influencer relationships. Slow to compound but disproportionately powerful for Gemini's Knowledge Graph and Perplexity's corroboration signals.

Why schema.org is the universal layer that feeds every engine

Schema is the one signal every engine reads, regardless of which pipeline family it prefers. Complete schema is the highest-ROI cross-engine move.

If you only have time to ship one thing across all six engines, ship complete schema.org markup. Product schema on every PDP, FAQPage schema on product and buyer-guide pages, BreadcrumbList on every non-root page, Organization on the root. Every engine reads this markup in some form — ChatGPT via Bing, Gemini via Google index, Perplexity and Claude via direct crawl, Copilot via Bing, You.com via its own crawler. A store with complete schema benefits in six places simultaneously from one build.

The minimum viable schema stack for commerce

Product — name, description, image, brand, sku, gtin13, offers (price, availability, priceCurrency), aggregateRating (when reviews exist).
FAQPage — 6-8 question-answer pairs per major PDP and per buyer guide.
BreadcrumbList — renders your navigation path for trust and context.
Organization — name, url, sameAs (LinkedIn, Crunchbase, Wikipedia if applicable), contactPoint, logo. E-E-A-T foundation.
HowTo — when you publish procedural content (setup guides, care instructions, style tutorials).

Third-party sources and why merchants should not chase them directly

Reddit, Trustpilot, Wikipedia, and creator content all feed AI retrievers. Gaming these is a losing game; shipping products worth talking about is the durable play.

Third-party sources are the part of the map merchants have the least control over and the most temptation to game. Fake Reddit posts, pay-for-Trustpilot-review operators, mass-edited Wikipedia entries — all of these are known anti-patterns and all of them are detected and penalised by modern retrieval quality layers. The durable play is the opposite: ship products worth discussing, then make participation in authentic conversation part of your operational rhythm.

1Reddit. Create an authenticated company account, disclose affiliation, answer questions in category subreddits, never vote-brigade. Reddit's moderation of inauthentic commercial activity is strong; AI retrievers detect the same patterns.
2Trustpilot. Ship a post-purchase review-request email that does not gate positive reviews. Respond to 100% of reviews within 14 days. Maintain a complete profile with photos and contact details.
3Wikipedia. Almost nothing for most stores — Wikipedia notability thresholds are high. If your brand crosses the threshold (major press coverage, significant revenue, category-defining moment), work with a specialist writer who understands Wikipedia's conflict-of-interest policies.
4Creator and editorial content. Traditional PR, but prioritise creators whose audience overlaps your buyer. A single genuinely enthusiastic creator review is worth more than 10 paid placements for AI corroboration.

“The worst thing you can do to your AI visibility is fake the corroboration layer. The second worst is ignore it entirely. The path in between is slow, authentic participation — and that is what actually compounds.”

— Nora Kimura, AI Retrieval Researcher

Frequently asked questions

Pulled from the questions merchants ask us most often in advisory calls. Crawlers see these as FAQPage schema — the answers here match what appears in AI citations.

Not directly. ChatGPT draws from Shopify's Agentic Commerce Protocol (for transactional prompts) and Bing editorial index (for informational prompts). Your Google Merchant Center feed flows into Gemini and Google AI Overviews, not ChatGPT. The two feeds should be kept at parity because your competitors across engines will win on whichever feed is cleaner.

Free · 5 minutes · no signup

Ready to see your store's GEO score?

Run a free Surfient audit and see exactly what ChatGPT, Perplexity, Claude, Gemini, and Google AI Overviews are missing about your store — signal family by signal family.

Run free audit See the platform

GEO score

Engine readiness

Technical indexing

Content fit

Live example — your number is ready in about 90 seconds.

Keep reading

Browse all AI Guides