Why a gap audit beats a generic strategy conversation
Generic AI strategy stalls because it lacks specifics. A gap audit produces a specific competitor, a specific prompt, a specific missing signal, and a specific fix.
Most conversations about 'improving AI visibility' stall because they lack the specificity required to act. The marketer knows they should 'show up more in ChatGPT' but has no concrete picture of which prompts they are missing, which competitor they are losing to, or which specific signal separates the cited pages from their own. Without that picture, every improvement move is a guess. A gap audit fixes the specificity problem directly — you leave the exercise with a list of exact competitor URLs, exact prompts they win, and exact signals they emit that you do not.
The other benefit is that gap audits compound. The 30-prompt panel you build for the first audit becomes the baseline for the quarterly follow-up — and because you have specific metrics (citation counts, source types, sentiment), the follow-up is a proper before-and-after measurement rather than a vibes-based check-in.
5-10
prioritised interventions produced by a typical two-afternoon gap audit
Surfient case-study average, 28 competitor gap audits completed for Shopify merchants, 2024-2026. Intervention count varies by category depth and competitor density.
Phase 1 — build the 30-query prompt panel
Mix brand-inclusive, brand-exclusive, and category-comparison prompts. Pull from real buyer research, not from SEO keyword tools.
The prompt panel is the core instrument of the audit. A weak panel (random keywords, SEO tool exports, generic questions) produces weak results. A strong panel reflects how real customers actually talk to AI assistants when considering your category — conversational, specific, and mixed across the buyer journey. We build 30 prompts across three types, evenly split.
- Brand-inclusive (10 prompts)
- Include your brand name. 'Is [brand] good for X?', 'What do customers say about [brand]?', '[brand] vs [competitor]'. Tests your defence — are you winning prompts that already name you?
- Category-exclusive (10 prompts)
- Do not name any brand. 'Best X for Y under $Z', 'What is the most comfortable running shoe for flat feet?', 'Recommend a sustainable leather bag brand'. Tests your offence — can you show up when the shopper has not heard of you?
- Comparison (10 prompts)
- Name two or three brands including yours. '[Your brand] vs [competitor A] vs [competitor B] for X'. Tests head-to-head visibility and where AIs side with rivals over you.
Sources for real buyer prompts
- Customer support chats and emails — the questions customers actually asked before buying. This is the single richest source.
- Sales team call notes — what objections and comparison questions came up most often.
- Reddit threads in your category — 'Help me decide between X and Y' style posts are live buyer prompts.
- AnswerThePublic, AlsoAsked, and similar tools — useful but leave for last; they tend toward SEO-shaped phrasings rather than conversational ones.
- Your own Google Search Console and Shopify search data — what people typed on your own site is what they would ask an AI.
Phase 2 — run the panel across engines and score every citation
Run each of 30 prompts across 5-6 engines. Log citations, source types, sentiment, and competitor mentions in a structured spreadsheet.
The second phase is execution — putting every prompt into every engine and recording structured output. It is tedious and takes the bulk of the audit's time, but the discipline of recording structured data is what lets you produce comparisons in phase three. Budget 3-4 hours of focused work; do not try to multitask this.
- 1Open a spreadsheet with columns: Prompt ID, Prompt Text, Engine, Your Citations (0/1), Your Rank (1-5), Competitor Citations (list), Source Types (web / reddit / forum / creator), Sentiment (positive / neutral / negative).
- 2Run each prompt in ChatGPT. Log citations and source types for your brand and every competitor mentioned.
- 3Repeat in Perplexity, Google AI Overviews, Gemini, Copilot, and Claude. Budget 30-45 minutes per engine.
- 4For each citation, note the specific URL cited. Those URLs become the basis of phase three diffing.
- 5Don't limit the panel to your named competitors — capture every brand that gets cited across the 30 prompts. The brands you didn't know about are often the most interesting findings.
3-4 hours
typical time to run a 30-query panel across six AI engines
Surfient methodology timing, averaged across 28 competitor gap audits. Includes prompt execution, citation logging, and structured-data capture but not the subsequent diff work.
Phase 3 — diff every cited competitor URL against your own
For each prompt you lost, compare the cited competitor page against your equivalent on schema, content, and off-site signals.
Phase three is where the 'gaps' become specific and actionable. For every prompt where a competitor beat you, open the cited competitor URL alongside your closest equivalent page and diff the two on structural signals. The goal is to find the recurring patterns — the signals that show up across competitor-winning pages and are missing from yours — because those patterns form the intervention roadmap.
- Schema diff
- Use Rich Results Test on both URLs. Log what schema types they emit (Product, FAQPage, BreadcrumbList, HowTo, Review, Article) and any fields present on theirs that are absent on yours.
- Content depth diff
- Word count, H2 structure, FAQ presence, spec depth, customer review count, photo count. The competitor's page is almost always deeper and more structured.
- Content specificity diff
- Do they answer specific sub-questions (wrist sizes, fit notes, compatibility)? Do they have numeric claims (water resistance to 5 ATM)? Your pages may be thinner on specifics.
- Off-site signal diff
- Where is the competitor mentioned that you are not? Reddit, forums, category publications, creator reviews. Use Surfient or manual site: searches to map.
- Technical hygiene diff
- Canonical coherence, llms.txt presence, ai-sitemap.xml freshness, hreflang correctness, server response time. Rare to be the reason but worth checking.
- Feed / shopping signal diff
- Do they appear in ChatGPT Shopping cards? In Google AI Shopping? Their merchant feed is likely cleaner or more complete.
Phase 4 — convert the gaps into a prioritised roadmap
Cluster recurring gaps into intervention themes. Prioritise by expected citation lift vs effort. Ship the top 3 in the first sprint.
The final phase translates the raw gap data into a roadmap. The recurring patterns across competitor-winning pages cluster into a handful of themes, typically 5-10. Each theme is an intervention you can resource against. Prioritise by expected citation lift (how many of the 30 prompts does this intervention potentially affect?) versus effort (how many hours to ship?) and rank accordingly.
The common intervention themes we see
- Product schema completeness — competitor has aggregateRating, review, additionalProperty; you have only name/price/brand. Intervention: ship richer Product schema site-wide.
- FAQ schema presence — competitor has FAQPage on every PDP; you have none. Intervention: author per-product FAQs and emit FAQPage schema.
- Content depth on PDPs — competitor's PDP is 2,000 words with fit guide, care instructions, spec table; yours is 400 words marketing copy. Intervention: deepen PDPs with structured facts.
- Off-site community presence — competitor is visible on 3-4 category Reddits and category forums; you are absent. Intervention: launch a 6-month Reddit participation programme.
- Buying-guide coverage — competitor has 10 long-form buying guides for the category; you have none. Intervention: ship a buying-guide hub.
- Review count and freshness — competitor has 200 reviews per top product, yours has 20. Intervention: invest in review acquisition.
- High lift, low effort
- Ship immediately. Schema gaps, feed gaps, canonical fixes. Target: first sprint.
- High lift, high effort
- Resource into a multi-month programme. Content depth, review acquisition, off-site community. Target: quarterly themes.
- Low lift, low lift value
- Backlog. Cosmetic fixes, minor schema refinements, small signal tweaks. Target: bundle with future sprints.
- Low lift, unknown value
- Test and measure. Hypotheses worth a week but not a quarter. Target: quarterly experimentation slot.
Running the audit on a quarterly cadence
First audit is exploratory. Subsequent quarterly audits reuse the prompt panel and become a clean before/after measurement.
The first gap audit is exploratory — you are building the panel, learning the engines, and discovering which competitors show up. By the second quarterly run, the panel is stable and the audit compresses into a much faster exercise. You reuse the 30 prompts, re-run across the engines, and compare the citation counts to last quarter's baseline. That comparison tells you which of your shipped interventions moved the needle and which did not.
- 1Q1 audit: exploratory. Build panel, run full phase 1-4. Ship top 3 interventions. 2 afternoons.
- 2Q2 audit: first measurement. Reuse panel, re-run, compare against Q1 baseline. Identify which interventions worked, which did not. Ship next 3. 1 afternoon.
- 3Q3 audit: pattern-recognition. Gaps are narrower and more specific. Competitor set may have shifted. Adjust panel slightly. 1 afternoon.
- 4Q4 audit: annual review. Revisit competitor set, revise panel, produce year-over-year trend. Inform next year's GEO budget. 1 afternoon.
“The merchants who win on AI visibility are not the ones with the cleverest single move. They are the ones whose gap audits quarter over quarter show consistent narrowing of the specific gaps they identified — because that narrowing is compounding, and compounding wins on long time horizons.”