How many queries do I need in my prompt panel?

20-30 is the sweet spot. Under 20 and the week-to-week variance swamps the signal; over 30 and the weekly run becomes unsustainable. Spread across brand, category, comparison, and problem-statement intent classes — roughly 5-8 queries each.

Do I need to run the panel on all five engines if my customers mostly use ChatGPT?

Yes. Your customers might prefer ChatGPT today, but the second most-used engine varies by demographic and vertical — Perplexity over-indexes on tech-savvy early adopters; Gemini over-indexes on Google Workspace users; Claude is growing among professional researchers. Measuring only your primary engine misses shifts in buyer behaviour that will hit you six months later.

Is it enough to run the panel once a month?

For most Shopify stores, weekly is the minimum viable cadence. AI engines refresh their indices every 7-14 days, and monthly runs miss the within-month changes that matter for iterative optimization. If the 90 minutes weekly is too much, prioritize weekly on ChatGPT and monthly on the other four — one engine weekly beats zero engines monthly.

Can I automate the prompt panel instead of running it manually?

Partially. Some engines (OpenAI API, Perplexity API) let you run programmatic queries, but API responses differ from browser responses because they bypass the consumer retrieval stack. For accurate measurement you either run the panel manually in consumer sessions, or use a tool like Surfient's visibility monitor that runs headless browser sessions matching the real consumer experience.

What should I do if my citation rate suddenly drops?

Three quick checks. First, verify AI bots are still reaching your pages (check server logs for GPTBot, ClaudeBot, PerplexityBot in the last 7 days). Second, check your feed reprocessed cleanly (GMC/MMC diagnostics). Third, run one query manually and look at who is winning the citation — a competitor might have shipped a passage that beats yours. Each has a different fix path.

How do I compare my Share of AI Voice to my competitors'?

Run the same panel on each competitor brand name and the same category queries. Track their citation rate, position, and verbatim rate. The absolute numbers matter less than the relative trend — if your share is rising while a competitor's is falling on the same query set, you are winning the passage battle on those queries.

AI GuidesMeasurement + monitoring

How to monitor brand mentions across AI engines

Share of AI Voice is measurable — but only with a consistent weekly panel, a fresh session per engine, and a measurement discipline that does not confuse anecdote with data.

Evan Mallick with Hiren Bhuva

Generative Commerce Analyst

10 minUpdated April 21, 2026

Run free audit Read the guide

answer-bars.svg

Why AI citation measurement is harder than rank tracking

AI answers are non-deterministic, session-sensitive, memory-biased, and engine-inconsistent. A measurement discipline has to account for all four.

Google rank tracking is a well-understood discipline because Google returns a deterministic SERP for a given query — run it again and you see the same page in the same position, modulo personalization. AI citation measurement breaks every one of those assumptions. Answers are non-deterministic (the same prompt can produce different citations in the same minute), session-sensitive (your past chat history biases retrieval toward stores you have already researched), memory-biased (ChatGPT Memory and Claude Projects retain context across sessions unless you explicitly disable), and engine-inconsistent (Perplexity cites 5-8 sources per answer; Claude cites 2-3; the raw counts do not compare).

3.4×

over-estimation of self-visibility when merchants test with Memory on versus off

Surfient measurement study, 92 Shopify merchants across ChatGPT and Claude, March 2026.

The net is that AI citation measurement requires a discipline — a fixed panel, a clean session policy, consistent cadence, and engine-normalized metrics. The good news is that the discipline is straightforward once you adopt it. The bad news is that a merchant who runs ad-hoc prompts and trusts the results is building a quarterly plan on noise.

step-flow.svgInfographic

Figure · step flowThe four-step arc this guide walks through — each numbered card maps to a section below.

How to build the prompt panel

20-30 queries spread across four intent classes — brand, category, comparison, problem-statement. Revisit quarterly.

The prompt panel is the single most important design decision in AI citation measurement. Too few queries and the variance across runs swamps the signal; too many and the weekly run becomes unsustainable and you stop doing it. 20-30 queries is the sweet spot. Spread them across four intent classes so the panel is not biased toward one type of query, and revisit the panel quarterly so it stays current with your product mix and the shopper language that is actually used.

Brand queries (5-8): 'moissanite watches by Kloira', 'Kloira reviews', 'is Kloira legit'. Direct intent — tests whether AI engines recognize your brand as a resolvable entity.
Category queries (5-8): 'best men's moissanite watch under $500', 'mid-range moissanite chronograph brands'. Comparison intent — tests whether AI engines include you in the consideration set for your category.
Comparison queries (5-8): 'Kloira vs Nomos', 'moissanite vs diamond for a watch', 'best Kloira alternative'. Trade-off intent — tests both category placement and competitive framing.
Problem-statement queries (5-6): 'I need a moissanite watch for my dad's 60th birthday', 'hypoallergenic moissanite watch options', 'waterproof moissanite dress watch'. Natural-language intent — tests long-tail sub-query matching.

The session policy that produces clean data

Fresh session, memory disabled, incognito or VPN, logged-out of commerce accounts, one-shot per run. The absence of bias is the whole value.

The measurement session is where most merchants lose the signal. A logged-in session with Memory enabled personalizes retrieval toward stores you have researched, stores you have asked about, and stores whose pages you have visited — all of which biases toward overstating your own visibility and understating your competitors'. The clean session policy below is what we use across all of our measurement work.

1Use an incognito or private browser window, or a clean VPN-routed session from a region representative of your target customers.
2Disable Memory in ChatGPT (Settings → Personalization → Memory → Off). Disable Projects in Claude. Clear conversation history before each engine run.
3Log out of any commerce account (Amazon, Shopify account, Google Shopping preferences). Retailer sessions can bias AI responses.
4Run each query exactly once per engine per week. Multiple runs the same day give you variance data, not signal; use the weekly cadence to smooth.
5Record the answer text verbatim into your spreadsheet. The cited sources, the position of your brand if mentioned, and whether the answer is verbatim from your content or paraphrased.
6Do not continue the conversation. Each query is a one-shot. Multi-turn conversations inject context that propagates bias into subsequent queries.

The three metrics that actually matter

Citation rate (coverage), citation position (prominence), citation verbatim rate (quotability). Any one alone is misleading; the three together tell the story.

Citation rate alone is the metric most merchants start with — and it is the one most likely to lead them astray. A 70% citation rate on brand queries is normal; a 70% citation rate on category queries is exceptional. A citation in position 4 (buried in the source list under three competitors) is not the same outcome as position 1 (foregrounded in the answer). Verbatim citations signal that your content is at the extraction threshold; paraphrased citations signal that you are in the pool but being outclassed on passage quality. The three metrics together produce a picture no single metric can.

Citation rate: Fraction of sessions in which your brand appears anywhere in the answer. Normalize by engine — Perplexity's 5-8 citations per answer gives every brand a higher base rate than Claude's 2-3.
Citation position: Where you appear in the source list. Position 1 carries the most visible weight; position 4+ is often invisible to the reader. Track separately for brand vs. category queries.
Citation verbatim rate: Fraction of citations where your content is quoted directly vs. paraphrased. High verbatim rate means your passages are at the extraction threshold — the goal state.
Share of AI Voice (composite): A weighted combination — citation rate × position weight × verbatim weight. Use for trend tracking; do not compare absolute numbers across vendors because methodologies differ.

44.2%

of AI answer citations come from the first 30% of a page's text

Surfient citation-position study, 2,400 AI answers across five engines, Q1 2026. The lede is where the engine quotes from first.

The five measurement mistakes merchants make most often

Logged-in sessions, single-run queries, mixed intent classes, raw-count cross-engine comparison, no baseline before optimization.

The five mistakes below are the ones we see merchants repeat even after reading the panel-construction guide. They are easy to avoid once you have seen them, and each one invalidates an otherwise-correct measurement loop.

1Running the panel from a logged-in, memory-on session. The data looks flattering but describes a personalized answer no real buyer sees. Fresh sessions, every time.
2Running each query only once. AI answers are non-deterministic — a single run on Monday might miss you; the same query Tuesday might cite you prominently. Use the weekly cadence to smooth, and trust multi-week trends over single-point snapshots.
3Mixing brand and category queries in the same score. Brand queries naturally have higher citation rates; averaging them with category queries inflates the composite. Report the two classes separately.
4Comparing raw citation counts across engines. Perplexity cites 5-8 sources per answer; Claude cites 2-3. A 60% rate on Perplexity is not directly comparable to a 60% rate on Claude. Normalize before you compare.
5Starting optimization without a baseline. If you do not measure your citation rate before you ship a fix, you cannot tell whether the fix worked. Baseline first, optimize second.

“Most AI citation 'measurement' merchants show me is actually anecdote dressed up with numbers — one logged-in session, one-shot queries, no baseline, and a conclusion that sounds like data. A disciplined weekly panel is dull and boring. It is also the only thing that works.”

— Evan Mallick, Generative Commerce Analyst at Surfient

What to do with the data — a weekly operating rhythm

Weekly review, monthly sub-query analysis, quarterly panel revisit. The cadence turns data into action.

A measurement panel that produces numbers but does not change behaviour is a wasted ritual. The operating rhythm below is what we use in customer reviews — a weekly quick look, a monthly deep dive, and a quarterly panel audit. Each surface answers a different question.

Weekly (15 minutes): Run the panel. Log the three metrics. Flag any query where citation rate dropped 20%+ week over week. Scan the top drop candidates for immediate causes (competitor push, feed regression, CDN-level block).
Monthly (90 minutes): Sub-query analysis. For any category query where you lost position, look at the specific passage the winner is quoted for. Rewrite your matching passage or add a missing FAQ entry. Review GPTBot/ClaudeBot access logs.
Quarterly (3 hours): Panel audit. Rotate 4-6 queries to reflect new products, seasonal shifts, or changed buyer language. Review the competitor set — are the brands you are comparing against still your real competitors on AI? Adjust the panel.

Frequently asked questions

Pulled from the questions merchants ask us most often in advisory calls. Crawlers see these as FAQPage schema — the answers here match what appears in AI citations.

20-30 is the sweet spot. Under 20 and the week-to-week variance swamps the signal; over 30 and the weekly run becomes unsustainable. Spread across brand, category, comparison, and problem-statement intent classes — roughly 5-8 queries each.

Free · 5 minutes · no signup

Ready to see your store's GEO score?

Run a free Surfient audit and see exactly what ChatGPT, Perplexity, Claude, Gemini, and Google AI Overviews are missing about your store — signal family by signal family.

Run free audit See the platform

GEO score

Engine readiness

Technical indexing

Content fit

Live example — your number is ready in about 90 seconds.

Keep reading

Browse all AI Guides