Most “GEO audits” you see in 2026 are still traditional SEO audits wearing a new jacket. Eighty pages of crawl statistics, meta description rewrites, internal link suggestions, and word count recommendations — none of which move AI citation share. We’ve reviewed forty-two GEO audit PDFs from agencies, consultants, and in-house teams in the last six months. Thirty-nine of them were theatre. This post is the scorecard we use to tell theatre from signal — so you can grade any audit in under twenty minutes before you pay for it.
What makes an audit “theatre”
Theatre audits borrow the credibility of traditional SEO auditing — comprehensive-looking, screenshot-heavy, checklist-exhaustive — without doing the actual job GEO auditing requires: reasoning about retrieval. The tell is always the same. Open the findings section, and every recommendation could have been generated by a 2019-era Screaming Frog report: add more internal links, increase word count, add alt text, compress images, fix broken links, update the sitemap. These aren’t wrong per se — a few still matter for conventional SERP — but none of them change whether Perplexity, ChatGPT, or AI Mode cites your site.
A signal audit starts from a different question: “If a buyer asks a retrieval-augmented LLM about this category, what does the engine actually do, and where is this merchant failing in that pipeline?” Answering requires a mental model of each engine’s retrieval pool, eligibility gates, and ranking signals — plus hands-on diagnosis of the specific schema, content, and corroboration patterns that determine inclusion.

The six dimensions of a signal audit
We grade audits on six dimensions, five points each, for a total of thirty. Any audit scoring under fifteen is theatre; anything above twenty-one is worth paying for; the middle band is “ask the auditor three follow-up questions before buying.” The six dimensions sit at different layers of the retrieval pipeline, and a strong audit touches every one of them.

Dimension 01 — Engine specificity
Theatre audits treat “AI” as a single entity. They say things like “optimise for AI search” or “AI-friendly content.” Signal audits name each engine separately because each has a different retrieval architecture. Perplexity runs a three-pass retrieval with explicit citation. ChatGPT uses Bing search-grounded retrieval with OpenAI-weighted re-ranking. Claude pulls from a narrower curated pool. Google AI Mode synthesises from organic SERP results plus AI Overviews grounding. An audit that doesn’t name engines specifically cannot diagnose specifically.
Dimension 02 — Baseline citation measurement
If the auditor didn’t measure your current citation share on a cohort of at least 50 category-relevant prompts before writing the audit, you’re looking at a checklist, not a diagnosis. The baseline is the control group. Without it, there’s no way to tell which recommendations moved the needle and which were noise. Signal audits include a CSV of the exact prompts, the citation presence per engine, and the citation rank position where relevant. Theatre audits say “your AI visibility is low” with no supporting data.
Dimension 03 — Schema diagnosis depth
The auditor should have opened view-source on your product pages, collection pages, and homepage. They should be able to tell you whether your Product node has a valid Offer, whether MerchantReturnPolicy is present, whether FAQPage schema is on the right templates, and whether conflicting apps are emitting duplicate or contradictory blocks. Theatre audits say “add schema.” Signal audits show you the exact blocks you’re missing with line numbers and remediation copy you can paste into Liquid or metafields.
Dimension 04 — Content extractability
Retrieval systems quote content. The question the auditor should answer is: which sentences on your site are quotable under a 400-token budget? Which claims are extractable as standalone facts? Which comparisons are honest enough to survive the LLM’s unsupported-claim detection? Theatre audits recommend “increase word count” and “improve readability.” Signal audits grade your top 20 landing pages on claim density, quotability, and honest spec disclosure, with rewrite examples for the lowest performers.
Dimension 05 — Cross-surface comparison
Your audit is only as useful as the competitive context it establishes. Signal audits compare your site against three to five direct category competitors on the same prompt cohort, on the same engines, in the same week. Theatre audits list your flaws in isolation — which is useless, because the retrieval pool is adversarial. You don’t need to be perfect; you need to be the best citation candidate in your category for the prompts that matter.
Dimension 06 — Remediation sequencing
A stack of 120 findings with no priority order is not an audit — it’s a bibliography. Signal audits order fixes by expected citation-share impact divided by implementation effort, and they group related fixes into sprints. Theatre audits give you a flat checklist and leave you to guess.
The signal audit’s end state: three fixes, not thirty
The most counterintuitive property of a signal audit is that it usually ends with three to five high-leverage fixes, not thirty. Theatre audits pad to look exhaustive because the asymmetric-information dynamic rewards the appearance of thoroughness. Signal audits trust their model of retrieval enough to say, “these three fixes will move your citation share measurably in the next 60 days; the rest is noise.” When a consultant hands you a 120-item fix list, they’re telling you they don’t have a model of which items move the needle.
The audits we’ve seen produce the biggest actual lift in client citation share share this profile: a tightly-scoped brief (often 14-22 pages, not 80), three to five prioritised fixes with implementation code, a baseline measurement CSV, and a follow-up measurement plan at 30/60/90 days to verify impact. Anything dramatically longer than that is usually hiding weak prioritisation behind volume.
How to score an audit you already have
If you’ve already paid for an audit and want to know whether to act on it: run the six-dimension scorecard yourself. Open the audit, go to each of the six dimensions above, and score 0-5 based on what the document actually contains. Total <15 → set aside and commission a signal audit; acting on the current one will burn team cycles with low return. Total 15-21 → actionable for the specific dimensions that scored 4+, ignore the rest. Total >21 → work through it in priority order and re-measure in 60 days.
- Demand engine-specific diagnosis. An audit that lumps “AI search” into one bucket cannot tell you where Perplexity differs from ChatGPT, and therefore cannot tell you where to invest.
- Require a baseline CSV, not just a score. The raw prompt-by-prompt citation data is the control group. Without it, you can’t measure whether the recommended fixes worked.
- Reject flat checklists. If the audit lists 40+ fixes with no priority order, it’s transferring the hard work to you.
- Ask for three-fix sprint plans, not 120-line bibliographies. Signal audits end in sprints, not libraries.
- Lock in a re-measure clause. Any auditor confident in their recommendations will agree to a 60-day follow-up measurement. Auditors who won’t agree to re-measure are selling theatre.
Closing — audit the audit
The cost of a theatre audit isn’t the invoice — it’s the three to six months your team spends executing against the wrong recommendations while your actual citation share quietly decays. An audit without retrieval reasoning is decor. Grade it before you pay for it; re-grade it before you act on it.