The current list of AI bots that matter for Shopify
Retrieval bots, ingestion bots, and interactive bots — each respects a different directive and each matters for a different GEO surface.
AI companies publish multiple bots, each with a specific purpose, and your robots.txt should handle them individually. The three functional categories are retrieval bots (fetch pages in response to a user query), ingestion bots (crawl at scale for training or index building), and interactive bots (fetch a specific page during a user's conversation). Mixing these up is the fastest way to accidentally block the bot that serves your citations.
- GPTBot
- OpenAI retrieval + training. Used for ChatGPT Search and model training. Allow for GEO; block if you opt out of training.
- ChatGPT-User
- OpenAI interactive — fetches pages during a ChatGPT conversation when the model decides to browse. Always allow if you want ChatGPT citations.
- OAI-SearchBot
- OpenAI search-index crawler — builds the ChatGPT Search index. Allow for visibility.
- ClaudeBot
- Anthropic retrieval. Feeds Claude's web search. Allow for Claude citations.
- PerplexityBot
- Perplexity retrieval. Feeds Perplexity answer generation. Allow.
- Perplexity-User
- Perplexity interactive — fetches specific pages during a user's search. Allow.
- Google-Extended
- Google's AI-specific crawler for Gemini and AI Overviews. Separate from Googlebot. Allow for Google AI visibility.
- Googlebot
- Google classic search crawler. Allow (always has been).
- Bingbot
- Microsoft search + Copilot + ChatGPT Search upstream. Allow.
- Applebot-Extended
- Apple's AI training crawler. Respects its own directive. Block if you opt out of Apple AI training; otherwise allow.
How to edit robots.txt on Shopify
Shopify generates robots.txt from robots.txt.liquid. You override or extend the default by editing that file — available on all paid plans.
Shopify auto-generates robots.txt from a Liquid template called robots.txt.liquid. On most themes this file does not exist by default — Shopify uses the built-in template — so the first step is to create it in your theme's templates folder. Once it exists, Shopify uses your version instead of the built-in one, and you can add or override directives freely.
Step 1: Create the robots.txt.liquid file
In Shopify Admin → Online Store → Themes → Actions → Edit code, navigate to the Templates folder. If robots.txt.liquid does not exist, click 'Add a new template' and select robots.txt — this creates the file with Shopify's default content. From here you can add or remove directives.
Step 2: The recommended AI-allowlist block
{% for group in robots.default_groups %}
{{- group.user_agent }}
{% for rule in group.rules -%}
{{ rule }}
{% endfor -%}
{%- if group.sitemap != blank -%}
{{ group.sitemap }}
{%- endif %}
{% endfor %}
# AI retrieval bots — explicit allow
User-agent: GPTBot
Allow: /
User-agent: ChatGPT-User
Allow: /
User-agent: OAI-SearchBot
Allow: /
User-agent: ClaudeBot
Allow: /
User-agent: PerplexityBot
Allow: /
User-agent: Perplexity-User
Allow: /
User-agent: Google-Extended
Allow: /
User-agent: Bingbot
Allow: /
# Additional sitemaps
Sitemap: {{ shop.url }}/sitemap.xml
Sitemap: {{ shop.url }}/ai-sitemap.xmlThis block preserves Shopify's default directives (the initial loop that outputs the standard groups) and adds explicit Allow rules for every major AI bot plus the ai-sitemap.xml reference. The explicit Allow is redundant for bots already covered by User-agent: * — Allow: /, but having named directives makes your intent auditable and protects you from future changes to the Shopify defaults.
The silent killer: CDN-level bot blocks below robots.txt
27% of Shopify stores have correct robots.txt but block AI bots at Cloudflare, Imperva, or the Shopify Bot Management layer. robots.txt is advisory; CDN blocks are enforced.
robots.txt is the polite part of bot control — an advisory document that well-behaved bots respect voluntarily. Below robots.txt, every store has one or more enforcement layers (Cloudflare Bot Management, Shopify's built-in bot protection, occasionally a third-party WAF) that return 403 or 429 regardless of what robots.txt says. In our audits, 27% of Shopify stores have a correct robots.txt that explicitly allows GPTBot while Cloudflare returns 403 to every GPTBot request. The bot honors your robots.txt; the CDN ignores it.
27%
of Shopify stores with correct robots.txt have a CDN-level block silently denying AI bots
Surfient infrastructure audit of 1,207 Shopify stores, Q1 2026. Most commonly Cloudflare Bot Fight Mode or Super Bot Fight Mode.
Where to check your CDN layer
- Cloudflare — Security → Bots. Disable 'Block AI Scrapers and Crawlers' if you want to allow GEO retrieval. Check Firewall Rules for any custom rule that blocks user-agent GPTBot or ClaudeBot.
- Shopify's built-in bot protection — Admin → Settings → Security and fraud prevention. Review the list of blocked bot patterns.
- Third-party WAFs (Imperva, Cloudflare Enterprise, Sucuri) — check the bot management rules for any AI-specific denies.
- Your server logs — grep for GPTBot, ClaudeBot, PerplexityBot in the last 14 days. Zero hits means something is blocking them silently.
Should you allow or block? The decision matrix
Retrieval bots — allow for GEO visibility. Training bots — your call. Most Shopify stores should allow everything; the opt-outs are edge cases.
The allow-or-block decision maps neatly to the functional bot categories. Retrieval bots serve citations to real users asking real questions — blocking them is blocking customers. Training bots feed models that will compete with you or distill your content into derivative works — the reasoning to block is content-protection; the cost is AI visibility. Most Shopify stores are in a market where AI visibility is worth the training exposure, but the calculus is genuinely different in categories like publishing, proprietary research, or original creative work.
- Allow retrieval bots
- GPTBot, ChatGPT-User, OAI-SearchBot, ClaudeBot, PerplexityBot, Perplexity-User, Google-Extended, Bingbot. These serve your citations.
- Allow ingestion bots (typical)
- For most DTC ecommerce stores, training exposure is worth the visibility. Block only if you have high-value proprietary content.
- Block Applebot-Extended (optional)
- If you specifically do not want Apple AI training on your content. Allow otherwise.
- Consider blocking CCBot
- Common Crawl — widely used for model training. Block if training exposure is a concern; allow if you want the downstream visibility.
How to verify AI bots actually reach your pages
Server-log grep is the only reliable signal. robots.txt allowance means nothing if the CDN is returning 403.
The only reliable way to verify AI bot access is to check your server logs. Shopify exposes access logs to Plus merchants via the Live View dashboard; on lower tiers you need either a Cloudflare logpush, a log-shipping app, or a sampling technique using the Shopify Admin API. Whatever the mechanism, the question you need to answer is the same: in the last 14 days, has each AI bot actually fetched pages from my store, and what response codes did they get?
- 1Pull 14 days of access logs. Filter by user-agent to isolate GPTBot, ChatGPT-User, ClaudeBot, PerplexityBot, Google-Extended, Bingbot.
- 2Count requests per bot per day — a healthy store sees 5-50 requests per day from each major bot on a catalog of 200+ SKUs.
- 3Check response codes. 200s are fine; 403s and 429s mean something is blocking. Occasional 429s are okay; persistent 403s mean a firewall rule needs to change.
- 4Verify the paths fetched. Are bots reaching your product pages, or only your homepage? A bot that only fetches / is either rate-limited or crawl-budget-starved.
- 5Cross-check against Search Console (for Googlebot and Google-Extended) and Bing Webmaster Tools (for Bingbot) to confirm what the engines say they fetched matches what your logs show.
“The store that thinks it allows AI bots and the store that actually does are rarely the same store. Three layers of access control — robots.txt, Shopify's default bot rules, and whatever Cloudflare mode you forgot you enabled — silently diverge over time.”