Skip to main content
AI GuidesTechnical indexing

robots.txt for AI bots on Shopify

Every major AI engine publishes a named crawler. Your robots.txt.liquid controls which of them can read your catalog — and the default Shopify robots.txt does not mention any of them.

Samir Bhattacharya with Hiren Bhuva

Shopify GEO Engineer

9 min
data-lanes.svg
robots.txt for AI bots on Shopifyllms.txtai-sitemap.xmlproducts.ndjsonProduct JSON-LDFAQPageHowTo

The current list of AI bots that matter for Shopify

Retrieval bots, ingestion bots, and interactive bots — each respects a different directive and each matters for a different GEO surface.

AI companies publish multiple bots, each with a specific purpose, and your robots.txt should handle them individually. The three functional categories are retrieval bots (fetch pages in response to a user query), ingestion bots (crawl at scale for training or index building), and interactive bots (fetch a specific page during a user's conversation). Mixing these up is the fastest way to accidentally block the bot that serves your citations.

GPTBot
OpenAI retrieval + training. Used for ChatGPT Search and model training. Allow for GEO; block if you opt out of training.
ChatGPT-User
OpenAI interactive — fetches pages during a ChatGPT conversation when the model decides to browse. Always allow if you want ChatGPT citations.
OAI-SearchBot
OpenAI search-index crawler — builds the ChatGPT Search index. Allow for visibility.
ClaudeBot
Anthropic retrieval. Feeds Claude's web search. Allow for Claude citations.
PerplexityBot
Perplexity retrieval. Feeds Perplexity answer generation. Allow.
Perplexity-User
Perplexity interactive — fetches specific pages during a user's search. Allow.
Google-Extended
Google's AI-specific crawler for Gemini and AI Overviews. Separate from Googlebot. Allow for Google AI visibility.
Googlebot
Google classic search crawler. Allow (always has been).
Bingbot
Microsoft search + Copilot + ChatGPT Search upstream. Allow.
Applebot-Extended
Apple's AI training crawler. Respects its own directive. Block if you opt out of Apple AI training; otherwise allow.
step-flow.svgInfographic
The four-step arc this guide walks through — each numbered card maps to a section below.01The current listof AI bots thatmatter for Shopify02to edit robots.txton Shopify03The silent killer:CDN-level botblocks below04Should you allowor block? Thedecision matrixSEQUENCE · STEP 1 → STEP 4
Figure · step flowThe four-step arc this guide walks through — each numbered card maps to a section below.

How to edit robots.txt on Shopify

Shopify generates robots.txt from robots.txt.liquid. You override or extend the default by editing that file — available on all paid plans.

Shopify auto-generates robots.txt from a Liquid template called robots.txt.liquid. On most themes this file does not exist by default — Shopify uses the built-in template — so the first step is to create it in your theme's templates folder. Once it exists, Shopify uses your version instead of the built-in one, and you can add or override directives freely.

Step 1: Create the robots.txt.liquid file

In Shopify Admin → Online Store → Themes → Actions → Edit code, navigate to the Templates folder. If robots.txt.liquid does not exist, click 'Add a new template' and select robots.txt — this creates the file with Shopify's default content. From here you can add or remove directives.

Step 2: The recommended AI-allowlist block

{% for group in robots.default_groups %}
{{- group.user_agent }}
{% for rule in group.rules -%}
{{ rule }}
{% endfor -%}
{%- if group.sitemap != blank -%}
{{ group.sitemap }}
{%- endif %}
{% endfor %}

# AI retrieval bots — explicit allow
User-agent: GPTBot
Allow: /

User-agent: ChatGPT-User
Allow: /

User-agent: OAI-SearchBot
Allow: /

User-agent: ClaudeBot
Allow: /

User-agent: PerplexityBot
Allow: /

User-agent: Perplexity-User
Allow: /

User-agent: Google-Extended
Allow: /

User-agent: Bingbot
Allow: /

# Additional sitemaps
Sitemap: {{ shop.url }}/sitemap.xml
Sitemap: {{ shop.url }}/ai-sitemap.xml

This block preserves Shopify's default directives (the initial loop that outputs the standard groups) and adds explicit Allow rules for every major AI bot plus the ai-sitemap.xml reference. The explicit Allow is redundant for bots already covered by User-agent: * — Allow: /, but having named directives makes your intent auditable and protects you from future changes to the Shopify defaults.

The silent killer: CDN-level bot blocks below robots.txt

27% of Shopify stores have correct robots.txt but block AI bots at Cloudflare, Imperva, or the Shopify Bot Management layer. robots.txt is advisory; CDN blocks are enforced.

robots.txt is the polite part of bot control — an advisory document that well-behaved bots respect voluntarily. Below robots.txt, every store has one or more enforcement layers (Cloudflare Bot Management, Shopify's built-in bot protection, occasionally a third-party WAF) that return 403 or 429 regardless of what robots.txt says. In our audits, 27% of Shopify stores have a correct robots.txt that explicitly allows GPTBot while Cloudflare returns 403 to every GPTBot request. The bot honors your robots.txt; the CDN ignores it.

27%

of Shopify stores with correct robots.txt have a CDN-level block silently denying AI bots

Surfient infrastructure audit of 1,207 Shopify stores, Q1 2026. Most commonly Cloudflare Bot Fight Mode or Super Bot Fight Mode.

Where to check your CDN layer

  • Cloudflare — Security → Bots. Disable 'Block AI Scrapers and Crawlers' if you want to allow GEO retrieval. Check Firewall Rules for any custom rule that blocks user-agent GPTBot or ClaudeBot.
  • Shopify's built-in bot protection — Admin → Settings → Security and fraud prevention. Review the list of blocked bot patterns.
  • Third-party WAFs (Imperva, Cloudflare Enterprise, Sucuri) — check the bot management rules for any AI-specific denies.
  • Your server logs — grep for GPTBot, ClaudeBot, PerplexityBot in the last 14 days. Zero hits means something is blocking them silently.

Should you allow or block? The decision matrix

Retrieval bots — allow for GEO visibility. Training bots — your call. Most Shopify stores should allow everything; the opt-outs are edge cases.

The allow-or-block decision maps neatly to the functional bot categories. Retrieval bots serve citations to real users asking real questions — blocking them is blocking customers. Training bots feed models that will compete with you or distill your content into derivative works — the reasoning to block is content-protection; the cost is AI visibility. Most Shopify stores are in a market where AI visibility is worth the training exposure, but the calculus is genuinely different in categories like publishing, proprietary research, or original creative work.

Allow retrieval bots
GPTBot, ChatGPT-User, OAI-SearchBot, ClaudeBot, PerplexityBot, Perplexity-User, Google-Extended, Bingbot. These serve your citations.
Allow ingestion bots (typical)
For most DTC ecommerce stores, training exposure is worth the visibility. Block only if you have high-value proprietary content.
Block Applebot-Extended (optional)
If you specifically do not want Apple AI training on your content. Allow otherwise.
Consider blocking CCBot
Common Crawl — widely used for model training. Block if training exposure is a concern; allow if you want the downstream visibility.

How to verify AI bots actually reach your pages

Server-log grep is the only reliable signal. robots.txt allowance means nothing if the CDN is returning 403.

The only reliable way to verify AI bot access is to check your server logs. Shopify exposes access logs to Plus merchants via the Live View dashboard; on lower tiers you need either a Cloudflare logpush, a log-shipping app, or a sampling technique using the Shopify Admin API. Whatever the mechanism, the question you need to answer is the same: in the last 14 days, has each AI bot actually fetched pages from my store, and what response codes did they get?

  1. 1Pull 14 days of access logs. Filter by user-agent to isolate GPTBot, ChatGPT-User, ClaudeBot, PerplexityBot, Google-Extended, Bingbot.
  2. 2Count requests per bot per day — a healthy store sees 5-50 requests per day from each major bot on a catalog of 200+ SKUs.
  3. 3Check response codes. 200s are fine; 403s and 429s mean something is blocking. Occasional 429s are okay; persistent 403s mean a firewall rule needs to change.
  4. 4Verify the paths fetched. Are bots reaching your product pages, or only your homepage? A bot that only fetches / is either rate-limited or crawl-budget-starved.
  5. 5Cross-check against Search Console (for Googlebot and Google-Extended) and Bing Webmaster Tools (for Bingbot) to confirm what the engines say they fetched matches what your logs show.
The store that thinks it allows AI bots and the store that actually does are rarely the same store. Three layers of access control — robots.txt, Shopify's default bot rules, and whatever Cloudflare mode you forgot you enabled — silently diverge over time.
Samir Bhattacharya, Shopify GEO Engineer at Surfient

Frequently asked questions

6

Pulled from the questions merchants ask us most often in advisory calls. Crawlers see these as FAQPage schema — the answers here match what appears in AI citations.

  • Not in robots.txt — the default Shopify robots.txt allows all bots via a User-agent: * wildcard. But Shopify does run its own bot protection and many stores run Cloudflare on top, both of which can block AI bots at the CDN layer while robots.txt says the opposite. Check server logs, not just robots.txt.

Free · 5 minutes · no signup

Ready to see your store's GEO score?

Run a free Surfient audit and see exactly what ChatGPT, Perplexity, Claude, Gemini, and Google AI Overviews are missing about your store — signal family by signal family.

0

GEO score

Engine readiness

0

Technical indexing

0

Content fit

0

Live example — your number is ready in about 90 seconds.

Keep reading

Browse all AI Guides