Can I access my Shopify server logs directly?

No, Shopify does not expose raw server logs to merchants. The platform-level logs are an internal Shopify resource and have been since launch. The practical alternatives are Cloudflare logs if you proxy through Cloudflare, Shopify's limited built-in bot analytics, or app-proxy-based logging for custom routes. Of the three, Cloudflare is by far the most complete if your store already runs through it.

Are AI crawler user-agent strings stable over time?

Mostly yes, but they do evolve. The core strings (GPTBot, ClaudeBot, PerplexityBot) have been stable since launch, but version numbers and secondary crawlers (OAI-SearchBot, Perplexity-User) have been added as AI products ship new features. Check the official documentation quarterly — the stable URLs are openai.com/gptbot, www.anthropic.com/claudebot, and perplexity.ai/perplexitybot.

Do AI crawlers spoof user-agents or use unmarked IPs?

The major ones do not. OpenAI, Anthropic, Perplexity, Google, and Microsoft all publish official IP ranges and stable user-agents for their crawlers. Spoofing would violate their published guidelines and damage relationships with publishers. A small minority of smaller AI companies are less transparent; for those, IP-range checking is the fallback. For the major five or six, user-agent matching is reliable.

How often should I review the log data?

Weekly for a quick pass (15-30 minutes), monthly for a deeper review, quarterly for a full audit with remediation plan. The weekly pass catches configuration regressions early; the deeper reviews surface patterns worth acting on. Annual-only review is too infrequent — AI crawler behaviour changes meaningfully quarter to quarter.

Do AI crawlers respect crawl-delay directives?

Inconsistently. GPTBot respects a crawl-delay in robots.txt; ClaudeBot and PerplexityBot document support but behaviour in practice is less consistent. For most Shopify stores, the crawl rate is low enough that crawl-delay is unnecessary — you are more likely to want to encourage crawling, not restrict it. Rate limiting AI crawlers on Shopify is rarely the right call.

Does my citation rate correlate with crawler visit frequency?

Loosely, yes — heavy crawler traffic to a page is often a leading indicator of rising citation rate on that page. The correlation is not tight enough to skip citation-rate tracking, but log frequency is a useful leading indicator between citation-rate measurements. If log frequency drops noticeably and citation rate has not yet moved, you have weeks of warning to investigate.

AI GuidesMeasurement + monitoring

Analyzing GPTBot, ClaudeBot, PerplexityBot hits

AI crawler traffic is quietly the most useful measurement surface in GEO — it tells you which pages retrievers actually visit, how often, and what they might be extracting. The catch is that most Shopify stores have never pulled a log file, and the ones that have look at the wrong thing.

Harry Parker with Hiren Bhuva

Head of AI Research, Surfient

10 minUpdated April 21, 2026

TL;DR

AI crawlers identify themselves via distinct user-agent strings — GPTBot, ClaudeBot, PerplexityBot, Google-Extended, Bingbot (for Copilot), Amazonbot, and a handful of smaller ones — and parsing your logs for those strings is the only direct way to see which AI engines are actually visiting.
Shopify does not give merchants direct access to server logs, but there are three workable paths: Cloudflare logs if you proxy through Cloudflare, Shopify's own bot analytics (partial coverage), and JavaScript-based crawler detection for pages not behind Cloudflare.
The five questions worth asking of the log data are: which crawlers visit, which pages they prefer, how often they re-crawl, whether they hit 404s or redirect chains, and whether the visit frequency matches your AI citation rate — answers to these diagnose most AI-visibility problems.

Run free audit Read the guide

data-lanes.svg

Why log analysis is the most honest measurement in GEO

Every other measurement is proxied. Log data is the direct signal — which crawler visited which page at what time.

Most GEO measurement is inferred. Citation rate tracking watches which of your pages get quoted across AI engines; share-of-voice tracking compares your citation rate to competitors; visibility monitors poll prompts and parse responses. All of these are valuable, and all are indirect — they measure outputs and infer inputs. Log analysis measures the input directly. If GPTBot visited your /products/moissanite-ring page at 03:14 UTC, the log shows that. No inference, no polling. This makes log analysis the cheapest honest answer to the question 'are AI crawlers actually seeing my content' — which is a question most merchants have never actually answered.

What logs tell you directly: Which crawlers visited, which URLs they fetched, what HTTP status you returned, what user-agent they announced, how often they re-visit.
What logs do not tell you: Whether the content was extracted, whether it was cited, whether it influenced a specific AI answer. Logs are the input side of retrieval, not the output side.
Complementary measurements: Citation rate tracking, share-of-voice panels, AI visibility monitors. Logs plus citation data give the full picture.

14.7%

of Shopify stores audited show zero AI crawler hits in the trailing 30 days

Surfient server-log audit panel, 420 Shopify stores with Cloudflare or log-sharing access, January-March 2026. Zero-hit stores typically had robots.txt or Cloudflare rules blocking the crawlers.

step-flow.svgInfographic

Figure · step flowThe four-step arc this guide walks through — each numbered card maps to a section below.

The user-agent strings for every AI crawler worth tracking

A canonical reference. OpenAI has two (crawler + search), Anthropic has one, Perplexity has two (crawler + user), Google has one for AI, Microsoft has one.

Identifying AI crawler traffic starts with knowing the user-agent strings they announce. Below is the canonical list as of April 2026 — check the crawlers' documentation periodically because they update. Pay attention to the distinction between indexing crawlers (which visit to train or maintain an index) and user-action bots (which visit because a user asked a question that referenced the page).

GPTBot: Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko); compatible; GPTBot/1.2; +https://openai.com/gptbot. OpenAI's general training and indexing crawler.
OAI-SearchBot: Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko); compatible; OAI-SearchBot/1.0; +https://openai.com/searchbot. OpenAI's search-specific crawler, used by ChatGPT Search.
ChatGPT-User: Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko); compatible; ChatGPT-User/1.0; +https://openai.com/bot. Live-fetch agent: visits URLs on behalf of ChatGPT users during a conversation.
ClaudeBot: Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko); compatible; ClaudeBot/1.0; [email protected]. Anthropic's primary crawler.
PerplexityBot: Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko); compatible; PerplexityBot/1.0; +https://perplexity.ai/perplexitybot. Perplexity's general crawler.
Perplexity-User: Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko); compatible; Perplexity-User/1.0; +https://perplexity.ai/bot. Live-fetch agent for Perplexity user queries.
Google-Extended: Google's AI training crawler. Shares infrastructure with Googlebot but appears as a distinct token in some log configurations. Controlled via robots.txt Google-Extended directive.
Bingbot (Copilot): Microsoft Copilot uses Bingbot for crawling. User-agent: Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm).
Amazonbot: Mozilla/5.0 (Linux; x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Version/96 Chrome/96 Safari/537.36 Amazonbot/0.1. Amazon's crawler feeding Rufus and related AI commerce features.

Crawler vs user-action distinction

GPTBot visits to index your site; ChatGPT-User visits because a specific user asked a question that led ChatGPT to fetch your URL. The distinction matters for two reasons. First, user-action traffic is proportional to your citation rate in real conversations — high ChatGPT-User traffic is a strong signal that your content is being used in answers. Second, user-action traffic should not be rate-limited by your bot rules even if you restrict indexing crawlers (though in practice you want both). PerplexityBot and Perplexity-User have the same split; OpenAI has three distinct agents covering indexing, search, and user actions.

How to get at the logs when you run Shopify

Three paths: Cloudflare Analytics (if you use Cloudflare), Shopify's built-in bot analytics (limited), or JavaScript-based server-side detection on custom app-proxy routes.

Shopify does not give merchants raw server log access — you cannot download the access.log for your store. This is a platform design decision and it is not going to change. Three workable alternatives exist, each with different tradeoffs.

1Cloudflare logs. If your Shopify store is served through a custom domain with Cloudflare in front, Cloudflare Analytics provides bot traffic breakdowns including per-user-agent data. Cloudflare Enterprise plans also offer raw log export. This is the cleanest path for stores already on Cloudflare.
2Shopify's built-in bot analytics. In Shopify admin, the Analytics reports show some bot traffic breakdowns — coverage is partial and does not expose raw logs, but it is better than nothing. Look for the sessions and traffic source reports.
3JavaScript-based server-side detection. Most AI crawlers do not execute JavaScript, so JS-based crawler detection misses them. The workaround is to use a small app proxy or a custom Shopify app that logs user-agent data for specific routes, then aggregate in your own analytics store.

What to expect from Cloudflare's bot analytics

Per-user-agent request counts: GPTBot, ClaudeBot, PerplexityBot, etc. aggregated by day. Useful for tracking frequency and catching sudden changes.
Top-visited URLs by bot: Which of your pages each AI crawler prefers. The most valuable single view — it tells you what AI engines actually index.
Response code distribution: Whether you are returning 200s or 404s to the crawlers. Broken redirects and missing pages surface here.
Bot blocking events: Whether your firewall rules are inadvertently challenging or blocking AI crawlers. Common misconfiguration finding.

Five questions worth answering from AI crawler log data

Who visits, what they visit, how often, what errors they hit, and whether visit frequency matches citation rate. Answers to these diagnose most GEO problems.

Once you have log access, the productive work is answering specific questions. Most merchants default to looking at aggregate bot traffic without knowing what they are looking for, which produces noise. The five questions below are what we actually use to diagnose AI-visibility issues on client stores — each one maps to a specific, fixable cause.

1Which AI crawlers visit my site at all? If one or more are absent, check robots.txt, Cloudflare firewall rules, and server-level rate limits. Absence is usually a config problem, not a retriever problem.
2Which pages do the crawlers prefer? Top-visited pages are usually your best AI-indexed surface. Pages that get no crawler traffic are either unreachable from the sitemap, blocked somewhere, or genuinely considered uninteresting.
3How often do the crawlers re-visit? High-churn catalogs need frequent re-crawling. If GPTBot visits your PDPs quarterly when you update daily, your updates are not propagating to ChatGPT. Consider submitting the sitemap through OpenAI's channels if they become available for your account.
4What HTTP status codes do the crawlers get? 404s, 301 chains, and 503s all waste crawler budget. Fix the 404s (often caused by out-of-stock product redirects or stale blog URLs) and collapse 301 chains into single redirects.
5Does visit frequency match citation rate? Heavy crawler visits plus low citation rate indicates content quality or schema issues (the crawlers read but do not cite). Light crawler visits plus high citation rate is unusual and usually signals strong external corroboration. Light crawler visits plus low citation rate is the classic GEO problem — work on crawler access first.

Parsing patterns — from raw logs to useful insight

Simple regex and grouping get you 90% of the insight. Complex log-analysis tools are overkill for small catalogs; useful for large ones.

The tooling for log analysis spans a wide spectrum. For a small Shopify store with a Cloudflare account, the free Cloudflare Analytics dashboard is sufficient. For a mid-sized store wanting custom analysis, a weekly export to a spreadsheet plus a few filter formulas is fine. For large catalogs or complex analysis, dedicated log-analysis tools (Screaming Frog Log File Analyser, Oncrawl, Botify) are worth the investment.

Minimum viable parsing

# If you have raw access.log (non-Shopify or custom setup):
grep -E "GPTBot|ClaudeBot|PerplexityBot|OAI-SearchBot|Google-Extended|Amazonbot" access.log | \
  awk '{print $7}' | \
  sort | uniq -c | sort -rn | head -50

# Outputs top 50 URLs visited by AI crawlers, sorted by frequency.
# For Cloudflare logs in JSON: pipe through jq instead of awk.

Questions you can answer with the above

Which 50 pages are my most AI-crawled? Starting set for content depth audits.
Are my top products in that list? If high-revenue SKUs are missing, it is usually a sitemap or internal linking problem.
Is my blog in that list? If yes, blog content is reaching AI retrievers; if no, reconsider whether your blog is in the sitemap.
What proportion of crawler visits hit 404s? High 404 rate is a clean-up priority.

Common findings from AI crawler log audits

Five patterns that show up repeatedly. Each has a specific fix; most fixes are under an hour.

Across the stores we audit, the log findings cluster around a recognisable set of patterns. Knowing the common patterns in advance shortens the time between 'I pulled the logs' and 'I know what to fix'.

Zero AI crawler traffic: robots.txt blocks them, or Cloudflare bot-fight mode is too aggressive. Fix: edit robots.txt and allow the bots in Cloudflare firewall rules.
High 404 rate from AI crawlers: Stale product URLs, discontinued SKUs, moved blog posts. Fix: implement proper 301s; do not leave 404s.
Crawlers visit but ignore specific pages: Missing from sitemap, orphaned in internal linking, or blocked per-URL. Fix: audit sitemap completeness and internal link paths.
Slow-response pages abandoned: Shows up as lots of start-fetches, few completed reads. Speed optimisation on those specific pages.
One crawler dominates, others absent: Usually means a specific bot is explicitly disallowed. Check robots.txt and Cloudflare rules per user-agent.

“Log analysis is the cheapest insurance policy against silent AI invisibility. Most merchants check citation rate and wonder why it is low; the stores that win check the logs first and fix the plumbing before they go chasing content problems.”

— Harry Parker, Head of AI Research, Surfient

Frequently asked questions

Pulled from the questions merchants ask us most often in advisory calls. Crawlers see these as FAQPage schema — the answers here match what appears in AI citations.

No, Shopify does not expose raw server logs to merchants. The platform-level logs are an internal Shopify resource and have been since launch. The practical alternatives are Cloudflare logs if you proxy through Cloudflare, Shopify's limited built-in bot analytics, or app-proxy-based logging for custom routes. Of the three, Cloudflare is by far the most complete if your store already runs through it.

Free · 5 minutes · no signup

Ready to see your store's GEO score?

Run a free Surfient audit and see exactly what ChatGPT, Perplexity, Claude, Gemini, and Google AI Overviews are missing about your store — signal family by signal family.

Run free audit See the platform

GEO score

Engine readiness

Technical indexing

Content fit

Live example — your number is ready in about 90 seconds.

Keep reading

Browse all AI Guides