Why not just rely on Shopify's built-in sitemap?

Shopify's sitemap lists URLs — that's all. AI engines ingest URLs, but they also want structured data (product price, availability, spec), a policy on citation, and a hint about which URLs are most worth reading. ai-sitemap.xml and llms.txt together carry that extra signal. Shopify's sitemap stays exactly as-is; the AI-Ready files layer on top.

Will this conflict with my existing robots.txt or sitemap?

No. llms.txt and ai-sitemap.xml live at separate paths. We add one line to robots.txt referencing the new sitemap entry — the standard Sitemap: directive — and nothing more. Your existing Google, Bing, and social crawler rules stay untouched.

Do AI engines actually read these files today?

Yes. We've instrumented access logs across 180 Surfient customers and tracked named-agent fetches from GPTBot, ClaudeBot, PerplexityBot, and Google-Extended on /llms.txt, /ai-sitemap.xml, and /products.ndjson in the past 90 days. The median store sees 12 fetches a week from named AI crawlers on these paths alone.

How big can the products.ndjson feed get?

We've tested up to 250,000 SKUs. NDJSON streams well — each line is independently parseable, so an engine can ingest at whatever chunk size it likes. The generator gzips responses, so a 50,000-SKU catalog transfers at around 8 MB over the wire.

Can I customise what goes in llms-full.txt?

Yes. llms-full.txt is the one file you author directly — in Surfient's admin, under Brand Facts. You write the citation-ready sentences; the engine composes them with the site metadata. Changes publish within 60 seconds and are versioned.

What happens when a product is deleted from Shopify?

The product/delete webhook fires, Surfient removes the row from products.ndjson and the URL from ai-sitemap.xml, invalidates the CDN cache, and logs the removal in your version history. Next time an AI engine fetches the feed, the SKU is gone — and so is the risk of a model citing a 404.

Surfient module · Distribution

Publish llms.txt, ai-sitemap.xml, and products.ndjson — always fresh, always valid

The plumbing AI engines probe for, generated for every Shopify store, refreshed whenever your catalog changes, and hosted on your own domain — not ours.

Surfient generates and hosts the four files AI engines actually fetch: llms.txt, llms-full.txt, ai-sitemap.xml, and products.ndjson — on your own domain, at conventional paths.
Files regenerate automatically on every product, collection, or policy update via Shopify webhooks — so the feed never lags the storefront by more than a few minutes.
Every generated file is validated against the published spec before it ships — no silent breakage when the ndjson format tightens or llms.txt gains a new section.

Generate my AI-Ready files See an example llms.txt

AI-Ready Files

Regenerated on every publish

5 / 5 live

surfient · regen

$regenllms.txt· 8.2 kbok

$regenllms-full.txt· 142 kbok

$regenai-sitemap.xml· 54 kbok

$regenproducts.ndjson· 2.1 MBok

$regenrss.xml· 12 kbok

▸5/5 published · 1.2s · next regen on publish event

llms.txt

sitemap

ndjson

The problem

Your storefront isn't the problem — your feeds are

Modern AI engines don't crawl the web the way Google did. They probe for a handful of conventional endpoints, ingest the structured data, and build their answer from that. A store without those endpoints is invisible the way a restaurant without a Google Business profile is invisible on Maps.

4
files AI engines probe for before they'll trust a site as a structured source
llms.txt, llms-full.txt, ai-sitemap.xml, and a JSON or NDJSON product feed at a predictable path.
0.7%
of Shopify stores publish even one of those four files today
We crawled 410,000 Shopify-hosted stores in January 2026. Three thousand had an llms.txt. Almost none had a working NDJSON feed.
11 min
median staleness for stores using a manual feed export
The merchants who do publish a feed typically regenerate it daily or weekly — the AI index catches that drift, and your bestsellers go out of sync.

How it works

Four files, zero drift

We generate each file from live Shopify data, regenerate on every webhook event, and serve them from your domain so AI engines don't have to follow a redirect chain.

Generate llms.txt
Surfient composes an llms.txt that describes your store's purpose, highlights your best-of catalog, lists your policies and contact endpoints, and points to the structured feeds below. The file follows the llms.txt 0.3 spec with the commerce extension — so ChatGPT, Claude, and Perplexity parse it without guesses.
Generate ai-sitemap.xml
A model-optimised sitemap that lists products, collections, articles, and policies with last-modified timestamps, canonical URLs, and an x-surfient-citability score. AI crawlers using it can pick the freshest, most citable pages first and skip the thin ones.
Generate products.ndjson
One product per line, each line a valid JSON object with title, handle, description, up to 8 media URLs, spec table, pricing, availability, review aggregate, and canonical URL. NDJSON parses stream-style, so an engine can ingest a 50,000-SKU catalog without loading it into memory.
Host on your domain + refresh on webhooks
Every file is served from your own domain at /llms.txt, /llms-full.txt, /ai-sitemap.xml, and /products.ndjson. We subscribe to the Shopify product/update, collection/update, and theme/publish webhooks — the moment something changes, the relevant file rebuilds and invalidates cache.

Inside the app

What you’ll see after install

Every number a Shopify merchant running Surfient AI-Ready Files tracks in one glance — live from the Surfient admin. AI engine splits, revenue lift, and the exact state of your catalog across ChatGPT, Perplexity, Google AI Overviews, Claude, Gemini, and Copilot.

app.surfient.com/ai-ready-files

Live

AI-Ready Files · llms.txt + ai-sitemap

Updated just now

Crawl freshness

100%

+3% this week

Every AI engine pulled the latest llms-full.txt within the last hour.

Products in NDJSON

3,412+88

llms-full.txt size

1.2 MBstable

Sitemap entries

4,168+92

Avg fetch latency

84 ms-12

AI engine traffic split · last 30 days

100% attributed

ChatGPT
1,924 fetches+312
Perplexity
1,650 fetches+241
Google AI Overviews
1,308 fetches+184
Claude
962 fetches+128
Gemini
619 fetches+71
Copilot
412 fetches+33

Fetches / day

6,875

across 6 engines

Cache hit rate

92%

Cloudflare R2

Manifest version

v412

auto-bumped

Last regen

3 min ago

on inventory delta

Capabilities

What the generator does for you

The surface area is deliberately narrow — publish the four files, validate them, keep them fresh. Everything below is in service of that.

llms.txt with the commerce extension
Not every llms.txt in the wild is valid. Surfient writes to the 0.3 spec plus the commerce extension so assistants treat your store as a product catalog source, not a generic blog. Includes store metadata, preferred contact, citation policy, and the list of feeds.
llms-full.txt with quotable brand facts
A longer companion file with brand-voice-approved facts an engine can quote verbatim. Policies, guarantees, shipping regions, craftsmanship claims — everything you want ChatGPT to know and nothing you don't. Edit the facts in the admin; the file regenerates in under a minute.
ai-sitemap.xml with citability hints
Standard sitemap fields plus x-surfient-citability, a float between 0 and 1 the audit engine computes for each URL. Crawlers that respect the hint (we've tested Perplexity, ClaudeBot, and our own partner crawlers) pull the highest-scoring pages first.
products.ndjson stream-friendly feed
NDJSON because AI engines ingest feeds at streaming scale. Each line is independent, so a 50,000-SKU catalog is trivially chunked. Fields match the Product schema an engine would extract anyway, which means no field mapping on ingestion.
Webhook-driven freshness
Every write to a Shopify resource fires the matching Surfient regenerator. Product updated? products.ndjson rebuilds in under 60 seconds. Policy page edited? llms-full.txt refreshes. Delete a SKU? It's gone from every feed before your replatform team has reloaded the tab.
Spec validation on every build
We validate the generated llms.txt against the 0.3 schema, the sitemap XML against the sitemap.org schema, and each NDJSON line against our Zod product schema. A build that would publish malformed output fails and pages a human — it never silently ships a broken file to an engine.
Served from your domain, not ours
The files live at yourstore.com/llms.txt, not surfient.com/yourstore/llms.txt. AI engines treat them as first-party statements from your brand — no redirect chains, no trust transfer, no CORS gotchas when a model probes the endpoint.
Version history and rollback
Every regeneration is archived. If a bad product description leaks into llms-full.txt, you can diff against yesterday's build and roll back the file in a single click while you fix the source.

Proof

“We had an ai-sitemap within 40 minutes of installing. Two weeks later Perplexity was citing our gift-guide collection page by name. The only thing that had changed was that we existed in a format the model could read.”

Lars Bergström · Founder, Nordfell Goods

40 min

install to first AI engine citation

Pairs well with

FAQ

Questions, answered straight

Why not just rely on Shopify's built-in sitemap?
Shopify's sitemap lists URLs — that's all. AI engines ingest URLs, but they also want structured data (product price, availability, spec), a policy on citation, and a hint about which URLs are most worth reading. ai-sitemap.xml and llms.txt together carry that extra signal. Shopify's sitemap stays exactly as-is; the AI-Ready files layer on top.
Will this conflict with my existing robots.txt or sitemap?
No. llms.txt and ai-sitemap.xml live at separate paths. We add one line to robots.txt referencing the new sitemap entry — the standard Sitemap: directive — and nothing more. Your existing Google, Bing, and social crawler rules stay untouched.
Do AI engines actually read these files today?
Yes. We've instrumented access logs across 180 Surfient customers and tracked named-agent fetches from GPTBot, ClaudeBot, PerplexityBot, and Google-Extended on /llms.txt, /ai-sitemap.xml, and /products.ndjson in the past 90 days. The median store sees 12 fetches a week from named AI crawlers on these paths alone.
How big can the products.ndjson feed get?
We've tested up to 250,000 SKUs. NDJSON streams well — each line is independently parseable, so an engine can ingest at whatever chunk size it likes. The generator gzips responses, so a 50,000-SKU catalog transfers at around 8 MB over the wire.
Can I customise what goes in llms-full.txt?
Yes. llms-full.txt is the one file you author directly — in Surfient's admin, under Brand Facts. You write the citation-ready sentences; the engine composes them with the site metadata. Changes publish within 60 seconds and are versioned.
What happens when a product is deleted from Shopify?
The product/delete webhook fires, Surfient removes the row from products.ndjson and the URL from ai-sitemap.xml, invalidates the CDN cache, and logs the removal in your version history. Next time an AI engine fetches the feed, the SKU is gone — and so is the risk of a model citing a 404.

Spin up your AI-Ready files in under an hour

We'll generate your llms.txt, llms-full.txt, ai-sitemap.xml, and products.ndjson from your live Shopify catalog — validated, hosted on your domain, refreshed whenever your store changes.

Generate my files See pricing

Publish llms.txt, ai-sitemap.xml, and products.ndjson — always fresh, always valid

Your storefront isn't the problem — your feeds are

Four files, zero drift

Generate llms.txt

Generate ai-sitemap.xml

Generate products.ndjson

Host on your domain + refresh on webhooks

What you’ll see after install

What the generator does for you

llms.txt with the commerce extension

llms-full.txt with quotable brand facts

ai-sitemap.xml with citability hints

products.ndjson stream-friendly feed

Webhook-driven freshness

Spec validation on every build

Served from your domain, not ours

Version history and rollback

Surfient GEO Audit Engine

Surfient AI Fix Pack

Surfient AI Visibility Monitor

Questions, answered straight

Spin up your AI-Ready files in under an hour

Publish llms.txt, ai-sitemap.xml, and products.ndjson — always fresh, always valid

Your storefront isn't the problem — your feeds are

Four files, zero drift

Generate llms.txt

Generate ai-sitemap.xml

Generate products.ndjson

Host on your domain + refresh on webhooks

What you’ll see after install

What the generator does for you

llms.txt with the commerce extension

llms-full.txt with quotable brand facts

ai-sitemap.xml with citability hints

products.ndjson stream-friendly feed

Webhook-driven freshness

Spec validation on every build

Served from your domain, not ours

Version history and rollback

Customer proof

Keep reading across the Surfient platform

Surfient GEO Audit Engine

Surfient AI Fix Pack

Surfient AI Visibility Monitor

Questions, answered straight

Spin up your AI-Ready files in under an hour