Skip to main content
Field NotesGEO Playbook10 min read

“GEO audits” are 90% theatre

We reviewed 42 GEO audit PDFs in the last six months. 39 were traditional SEO audits wearing a new jacket — comprehensive-looking, screenshot-heavy, and completely disconnected from how AI retrieval actually works. Here’s the six-dimension scorecard to grade any audit in under 20 minutes before you pay for it.

Harry Parker
Co-founder, Onviqa Inc. · Surfient
audits-theater
TL;DR
  • 39 of 42 GEO audits we reviewed in the last six months were traditional SEO audits relabelled — 80 pages of meta-description rewrites and internal-link suggestions that don’t move AI citation share.
  • Grade any audit on six dimensions (engine specificity, baseline citation measurement, schema diagnosis depth, content extractability, cross-surface comparison, remediation sequencing) for 30 points. Under 15 is theatre; above 21 is worth acting on.
  • Signal audits end in three to five prioritised fixes with a baseline CSV and a 60-day re-measure clause — not a 120-item flat checklist. If the auditor won’t commit to re-measure, they’re selling decor.

Most “GEO audits” you see in 2026 are still traditional SEO audits wearing a new jacket. Eighty pages of crawl statistics, meta description rewrites, internal link suggestions, and word count recommendations — none of which move AI citation share. We’ve reviewed forty-two GEO audit PDFs from agencies, consultants, and in-house teams in the last six months. Thirty-nine of them were theatre. This post is the scorecard we use to tell theatre from signal — so you can grade any audit in under twenty minutes before you pay for it.

What makes an audit “theatre”

Theatre audits borrow the credibility of traditional SEO auditing — comprehensive-looking, screenshot-heavy, checklist-exhaustive — without doing the actual job GEO auditing requires: reasoning about retrieval. The tell is always the same. Open the findings section, and every recommendation could have been generated by a 2019-era Screaming Frog report: add more internal links, increase word count, add alt text, compress images, fix broken links, update the sitemap. These aren’t wrong per se — a few still matter for conventional SERP — but none of them change whether Perplexity, ChatGPT, or AI Mode cites your site.

A signal audit starts from a different question: “If a buyer asks a retrieval-augmented LLM about this category, what does the engine actually do, and where is this merchant failing in that pipeline?” Answering requires a mental model of each engine’s retrieval pool, eligibility gates, and ranking signals — plus hands-on diagnosis of the specific schema, content, and corroboration patterns that determine inclusion.

Two-column comparison. Left red column lists nine theatre findings like Add more internal links, Update meta descriptions, Increase word count to 2000 plus, Add alt text, Compress images, Submit sitemap, Fix H1 structure, Install HTTPS, Mobile friendly test. Each has a subtext explaining why it does not move AI citation share. Right cyan column lists nine signal findings like MerchantReturnPolicy missing on 47 PDPs, FAQPage missing on top 20 categories, Reddit presence zero in 4 of 5 categories, Product offer price stale on 12 PDPs, Claude retrieval path untested, Perplexity shopping mode eligibility failing, ChatGPT citation frequency decaying, AI Mode turn 2 coverage missing, cross-engine share monitoring absent. Each has a subtext explaining the retrieval mechanic it fixes.
Theatre column (left) versus signal column (right). Left: nine findings that look thorough but don't move AI citation share. Right: nine findings that map directly to retrieval mechanics on Perplexity, ChatGPT, and Google AI Mode.

The six dimensions of a signal audit

We grade audits on six dimensions, five points each, for a total of thirty. Any audit scoring under fifteen is theatre; anything above twenty-one is worth paying for; the middle band is “ask the auditor three follow-up questions before buying.” The six dimensions sit at different layers of the retrieval pipeline, and a strong audit touches every one of them.

Six-row scorecard. Row 1 engine specificity 5 points: does the audit name Perplexity ChatGPT Claude AI Mode separately or treat AI as one entity. Row 2 baseline citation measurement 5 points: is there a cohort of 50 plus prompts with measured citation share before tactics. Row 3 schema diagnosis depth 5 points: does the auditor view-source and reason about Product Offer MerchantReturnPolicy FAQPage. Row 4 content extractability 5 points: quotability claim density honest spec disclosure. Row 5 cross-surface comparison 5 points: does the audit compare your site against 3 to 5 competitors on same prompts. Row 6 remediation sequencing 5 points: are fixes ordered by expected citation impact and effort. Totals on right show theatre average 6 out of 30 and signal average 27 out of 30.
The six-dimension audit scorecard: engine specificity, baseline citation measurement, schema diagnosis depth, content extractability, cross-surface comparison, and remediation sequencing. Theatre audits average 6/30; signal audits average 27/30.

Dimension 01 — Engine specificity

Theatre audits treat “AI” as a single entity. They say things like “optimise for AI search” or “AI-friendly content.” Signal audits name each engine separately because each has a different retrieval architecture. Perplexity runs a three-pass retrieval with explicit citation. ChatGPT uses Bing search-grounded retrieval with OpenAI-weighted re-ranking. Claude pulls from a narrower curated pool. Google AI Mode synthesises from organic SERP results plus AI Overviews grounding. An audit that doesn’t name engines specifically cannot diagnose specifically.

Dimension 02 — Baseline citation measurement

If the auditor didn’t measure your current citation share on a cohort of at least 50 category-relevant prompts before writing the audit, you’re looking at a checklist, not a diagnosis. The baseline is the control group. Without it, there’s no way to tell which recommendations moved the needle and which were noise. Signal audits include a CSV of the exact prompts, the citation presence per engine, and the citation rank position where relevant. Theatre audits say “your AI visibility is low” with no supporting data.

Dimension 03 — Schema diagnosis depth

The auditor should have opened view-source on your product pages, collection pages, and homepage. They should be able to tell you whether your Product node has a valid Offer, whether MerchantReturnPolicy is present, whether FAQPage schema is on the right templates, and whether conflicting apps are emitting duplicate or contradictory blocks. Theatre audits say “add schema.” Signal audits show you the exact blocks you’re missing with line numbers and remediation copy you can paste into Liquid or metafields.

Dimension 04 — Content extractability

Retrieval systems quote content. The question the auditor should answer is: which sentences on your site are quotable under a 400-token budget? Which claims are extractable as standalone facts? Which comparisons are honest enough to survive the LLM’s unsupported-claim detection? Theatre audits recommend “increase word count” and “improve readability.” Signal audits grade your top 20 landing pages on claim density, quotability, and honest spec disclosure, with rewrite examples for the lowest performers.

Dimension 05 — Cross-surface comparison

Your audit is only as useful as the competitive context it establishes. Signal audits compare your site against three to five direct category competitors on the same prompt cohort, on the same engines, in the same week. Theatre audits list your flaws in isolation — which is useless, because the retrieval pool is adversarial. You don’t need to be perfect; you need to be the best citation candidate in your category for the prompts that matter.

Dimension 06 — Remediation sequencing

A stack of 120 findings with no priority order is not an audit — it’s a bibliography. Signal audits order fixes by expected citation-share impact divided by implementation effort, and they group related fixes into sprints. Theatre audits give you a flat checklist and leave you to guess.

The signal audit’s end state: three fixes, not thirty

The most counterintuitive property of a signal audit is that it usually ends with three to five high-leverage fixes, not thirty. Theatre audits pad to look exhaustive because the asymmetric-information dynamic rewards the appearance of thoroughness. Signal audits trust their model of retrieval enough to say, “these three fixes will move your citation share measurably in the next 60 days; the rest is noise.” When a consultant hands you a 120-item fix list, they’re telling you they don’t have a model of which items move the needle.

The audits we’ve seen produce the biggest actual lift in client citation share share this profile: a tightly-scoped brief (often 14-22 pages, not 80), three to five prioritised fixes with implementation code, a baseline measurement CSV, and a follow-up measurement plan at 30/60/90 days to verify impact. Anything dramatically longer than that is usually hiding weak prioritisation behind volume.

How to score an audit you already have

If you’ve already paid for an audit and want to know whether to act on it: run the six-dimension scorecard yourself. Open the audit, go to each of the six dimensions above, and score 0-5 based on what the document actually contains. Total <15 → set aside and commission a signal audit; acting on the current one will burn team cycles with low return. Total 15-21 → actionable for the specific dimensions that scored 4+, ignore the rest. Total >21 → work through it in priority order and re-measure in 60 days.

  • Demand engine-specific diagnosis. An audit that lumps “AI search” into one bucket cannot tell you where Perplexity differs from ChatGPT, and therefore cannot tell you where to invest.
  • Require a baseline CSV, not just a score. The raw prompt-by-prompt citation data is the control group. Without it, you can’t measure whether the recommended fixes worked.
  • Reject flat checklists. If the audit lists 40+ fixes with no priority order, it’s transferring the hard work to you.
  • Ask for three-fix sprint plans, not 120-line bibliographies. Signal audits end in sprints, not libraries.
  • Lock in a re-measure clause. Any auditor confident in their recommendations will agree to a 60-day follow-up measurement. Auditors who won’t agree to re-measure are selling theatre.

Closing — audit the audit

The cost of a theatre audit isn’t the invoice — it’s the three to six months your team spends executing against the wrong recommendations while your actual citation share quietly decays. An audit without retrieval reasoning is decor. Grade it before you pay for it; re-grade it before you act on it.

Tags:auditsgeoagenciesprocurementquality

Frequently asked questions

Try Surfient free

See how your Shopify store scores with AI engines

Surfient audits every signal ChatGPT, Perplexity, Claude, and Google AI Overviews read on your store — in under 60 seconds, with no install, no card, no catch.

  • ChatGPT, Perplexity, Claude, and AI Overviews
  • Store-by-store score with fix priorities
  • 60-second audit, no install or card

Sources & further reading

  1. Surfient internal audit review dataset &mdash; 42 GEO audits graded on the six-dimension scorecard
    Surfient Research2026-02-21
Harry Parker
Co-founder, Onviqa Inc. · Surfient

Harry has led SEO and e-commerce engineering for over 12 years and has been shipping Shopify software since Onviqa was founded in 2014. He writes about where commerce is headed when shoppers stop typing queries and start asking assistants.

Related reading

All posts