Skip to main content
AI GuidesMeasurement + monitoring

Analyzing GPTBot, ClaudeBot, PerplexityBot hits

AI crawler traffic is quietly the most useful measurement surface in GEO — it tells you which pages retrievers actually visit, how often, and what they might be extracting. The catch is that most Shopify stores have never pulled a log file, and the ones that have look at the wrong thing.

Harry Parker with Hiren Bhuva

Head of AI Research, Surfient

10 min
data-lanes.svg
Analyzing GPTBot, ClaudeBot, PerplexityBot hitsllms.txtai-sitemap.xmlproducts.ndjsonProduct JSON-LDFAQPageHowTo

Why log analysis is the most honest measurement in GEO

Every other measurement is proxied. Log data is the direct signal — which crawler visited which page at what time.

Most GEO measurement is inferred. Citation rate tracking watches which of your pages get quoted across AI engines; share-of-voice tracking compares your citation rate to competitors; visibility monitors poll prompts and parse responses. All of these are valuable, and all are indirect — they measure outputs and infer inputs. Log analysis measures the input directly. If GPTBot visited your /products/moissanite-ring page at 03:14 UTC, the log shows that. No inference, no polling. This makes log analysis the cheapest honest answer to the question 'are AI crawlers actually seeing my content' — which is a question most merchants have never actually answered.

What logs tell you directly
Which crawlers visited, which URLs they fetched, what HTTP status you returned, what user-agent they announced, how often they re-visit.
What logs do not tell you
Whether the content was extracted, whether it was cited, whether it influenced a specific AI answer. Logs are the input side of retrieval, not the output side.
Complementary measurements
Citation rate tracking, share-of-voice panels, AI visibility monitors. Logs plus citation data give the full picture.

14.7%

of Shopify stores audited show zero AI crawler hits in the trailing 30 days

Surfient server-log audit panel, 420 Shopify stores with Cloudflare or log-sharing access, January-March 2026. Zero-hit stores typically had robots.txt or Cloudflare rules blocking the crawlers.

step-flow.svgInfographic
The four-step arc this guide walks through — each numbered card maps to a section below.01log analysis isthe most honestmeasurement in GEO02The user-agentstrings for everyAI crawler worth03to get at the logswhen you runShopify04Five questionsworth answeringfrom AI crawlerSEQUENCE · STEP 1 → STEP 4
Figure · step flowThe four-step arc this guide walks through — each numbered card maps to a section below.

The user-agent strings for every AI crawler worth tracking

A canonical reference. OpenAI has two (crawler + search), Anthropic has one, Perplexity has two (crawler + user), Google has one for AI, Microsoft has one.

Identifying AI crawler traffic starts with knowing the user-agent strings they announce. Below is the canonical list as of April 2026 — check the crawlers' documentation periodically because they update. Pay attention to the distinction between indexing crawlers (which visit to train or maintain an index) and user-action bots (which visit because a user asked a question that referenced the page).

GPTBot
Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko); compatible; GPTBot/1.2; +https://openai.com/gptbot. OpenAI's general training and indexing crawler.
OAI-SearchBot
Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko); compatible; OAI-SearchBot/1.0; +https://openai.com/searchbot. OpenAI's search-specific crawler, used by ChatGPT Search.
ChatGPT-User
Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko); compatible; ChatGPT-User/1.0; +https://openai.com/bot. Live-fetch agent: visits URLs on behalf of ChatGPT users during a conversation.
ClaudeBot
Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko); compatible; ClaudeBot/1.0; [email protected]. Anthropic's primary crawler.
PerplexityBot
Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko); compatible; PerplexityBot/1.0; +https://perplexity.ai/perplexitybot. Perplexity's general crawler.
Perplexity-User
Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko); compatible; Perplexity-User/1.0; +https://perplexity.ai/bot. Live-fetch agent for Perplexity user queries.
Google-Extended
Google's AI training crawler. Shares infrastructure with Googlebot but appears as a distinct token in some log configurations. Controlled via robots.txt Google-Extended directive.
Bingbot (Copilot)
Microsoft Copilot uses Bingbot for crawling. User-agent: Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm).
Amazonbot
Mozilla/5.0 (Linux; x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Version/96 Chrome/96 Safari/537.36 Amazonbot/0.1. Amazon's crawler feeding Rufus and related AI commerce features.

Crawler vs user-action distinction

GPTBot visits to index your site; ChatGPT-User visits because a specific user asked a question that led ChatGPT to fetch your URL. The distinction matters for two reasons. First, user-action traffic is proportional to your citation rate in real conversations — high ChatGPT-User traffic is a strong signal that your content is being used in answers. Second, user-action traffic should not be rate-limited by your bot rules even if you restrict indexing crawlers (though in practice you want both). PerplexityBot and Perplexity-User have the same split; OpenAI has three distinct agents covering indexing, search, and user actions.

How to get at the logs when you run Shopify

Three paths: Cloudflare Analytics (if you use Cloudflare), Shopify's built-in bot analytics (limited), or JavaScript-based server-side detection on custom app-proxy routes.

Shopify does not give merchants raw server log access — you cannot download the access.log for your store. This is a platform design decision and it is not going to change. Three workable alternatives exist, each with different tradeoffs.

  1. 1Cloudflare logs. If your Shopify store is served through a custom domain with Cloudflare in front, Cloudflare Analytics provides bot traffic breakdowns including per-user-agent data. Cloudflare Enterprise plans also offer raw log export. This is the cleanest path for stores already on Cloudflare.
  2. 2Shopify's built-in bot analytics. In Shopify admin, the Analytics reports show some bot traffic breakdowns — coverage is partial and does not expose raw logs, but it is better than nothing. Look for the sessions and traffic source reports.
  3. 3JavaScript-based server-side detection. Most AI crawlers do not execute JavaScript, so JS-based crawler detection misses them. The workaround is to use a small app proxy or a custom Shopify app that logs user-agent data for specific routes, then aggregate in your own analytics store.

What to expect from Cloudflare's bot analytics

Per-user-agent request counts
GPTBot, ClaudeBot, PerplexityBot, etc. aggregated by day. Useful for tracking frequency and catching sudden changes.
Top-visited URLs by bot
Which of your pages each AI crawler prefers. The most valuable single view — it tells you what AI engines actually index.
Response code distribution
Whether you are returning 200s or 404s to the crawlers. Broken redirects and missing pages surface here.
Bot blocking events
Whether your firewall rules are inadvertently challenging or blocking AI crawlers. Common misconfiguration finding.

Five questions worth answering from AI crawler log data

Who visits, what they visit, how often, what errors they hit, and whether visit frequency matches citation rate. Answers to these diagnose most GEO problems.

Once you have log access, the productive work is answering specific questions. Most merchants default to looking at aggregate bot traffic without knowing what they are looking for, which produces noise. The five questions below are what we actually use to diagnose AI-visibility issues on client stores — each one maps to a specific, fixable cause.

  1. 1Which AI crawlers visit my site at all? If one or more are absent, check robots.txt, Cloudflare firewall rules, and server-level rate limits. Absence is usually a config problem, not a retriever problem.
  2. 2Which pages do the crawlers prefer? Top-visited pages are usually your best AI-indexed surface. Pages that get no crawler traffic are either unreachable from the sitemap, blocked somewhere, or genuinely considered uninteresting.
  3. 3How often do the crawlers re-visit? High-churn catalogs need frequent re-crawling. If GPTBot visits your PDPs quarterly when you update daily, your updates are not propagating to ChatGPT. Consider submitting the sitemap through OpenAI's channels if they become available for your account.
  4. 4What HTTP status codes do the crawlers get? 404s, 301 chains, and 503s all waste crawler budget. Fix the 404s (often caused by out-of-stock product redirects or stale blog URLs) and collapse 301 chains into single redirects.
  5. 5Does visit frequency match citation rate? Heavy crawler visits plus low citation rate indicates content quality or schema issues (the crawlers read but do not cite). Light crawler visits plus high citation rate is unusual and usually signals strong external corroboration. Light crawler visits plus low citation rate is the classic GEO problem — work on crawler access first.

Parsing patterns — from raw logs to useful insight

Simple regex and grouping get you 90% of the insight. Complex log-analysis tools are overkill for small catalogs; useful for large ones.

The tooling for log analysis spans a wide spectrum. For a small Shopify store with a Cloudflare account, the free Cloudflare Analytics dashboard is sufficient. For a mid-sized store wanting custom analysis, a weekly export to a spreadsheet plus a few filter formulas is fine. For large catalogs or complex analysis, dedicated log-analysis tools (Screaming Frog Log File Analyser, Oncrawl, Botify) are worth the investment.

Minimum viable parsing

# If you have raw access.log (non-Shopify or custom setup):
grep -E "GPTBot|ClaudeBot|PerplexityBot|OAI-SearchBot|Google-Extended|Amazonbot" access.log | \
  awk '{print $7}' | \
  sort | uniq -c | sort -rn | head -50

# Outputs top 50 URLs visited by AI crawlers, sorted by frequency.
# For Cloudflare logs in JSON: pipe through jq instead of awk.

Questions you can answer with the above

  • Which 50 pages are my most AI-crawled? Starting set for content depth audits.
  • Are my top products in that list? If high-revenue SKUs are missing, it is usually a sitemap or internal linking problem.
  • Is my blog in that list? If yes, blog content is reaching AI retrievers; if no, reconsider whether your blog is in the sitemap.
  • What proportion of crawler visits hit 404s? High 404 rate is a clean-up priority.

Common findings from AI crawler log audits

Five patterns that show up repeatedly. Each has a specific fix; most fixes are under an hour.

Across the stores we audit, the log findings cluster around a recognisable set of patterns. Knowing the common patterns in advance shortens the time between 'I pulled the logs' and 'I know what to fix'.

Zero AI crawler traffic
robots.txt blocks them, or Cloudflare bot-fight mode is too aggressive. Fix: edit robots.txt and allow the bots in Cloudflare firewall rules.
High 404 rate from AI crawlers
Stale product URLs, discontinued SKUs, moved blog posts. Fix: implement proper 301s; do not leave 404s.
Crawlers visit but ignore specific pages
Missing from sitemap, orphaned in internal linking, or blocked per-URL. Fix: audit sitemap completeness and internal link paths.
Slow-response pages abandoned
Shows up as lots of start-fetches, few completed reads. Speed optimisation on those specific pages.
One crawler dominates, others absent
Usually means a specific bot is explicitly disallowed. Check robots.txt and Cloudflare rules per user-agent.
Log analysis is the cheapest insurance policy against silent AI invisibility. Most merchants check citation rate and wonder why it is low; the stores that win check the logs first and fix the plumbing before they go chasing content problems.
Harry Parker, Head of AI Research, Surfient

Frequently asked questions

6

Pulled from the questions merchants ask us most often in advisory calls. Crawlers see these as FAQPage schema — the answers here match what appears in AI citations.

  • No, Shopify does not expose raw server logs to merchants. The platform-level logs are an internal Shopify resource and have been since launch. The practical alternatives are Cloudflare logs if you proxy through Cloudflare, Shopify's limited built-in bot analytics, or app-proxy-based logging for custom routes. Of the three, Cloudflare is by far the most complete if your store already runs through it.

Free · 5 minutes · no signup

Ready to see your store's GEO score?

Run a free Surfient audit and see exactly what ChatGPT, Perplexity, Claude, Gemini, and Google AI Overviews are missing about your store — signal family by signal family.

0

GEO score

Engine readiness

0

Technical indexing

0

Content fit

0

Live example — your number is ready in about 90 seconds.

Keep reading

Browse all AI Guides