Why most AI visibility dashboards show too many metrics
Tooling vendors compete on feature count. Most of the extra metrics are derivative or decorative. The five core metrics are what actually drive decisions.
AI visibility tooling is in the phase every new analytics category goes through — vendors compete by adding metrics, because more numbers looks like more insight. Otterly publishes 40+ metrics; TrackAIMentions offers 25+; LLMClicks and Frizerly are similar. The dashboards look impressive, but most of the metrics are derivatives (citation rate per day, citation rate per engine, citation rate per prompt category) of two or three underlying numbers. Teams that try to track everything end up reacting to noise; teams that track the five core metrics consistently end up making better decisions.
- Otterly
- Strong cross-engine coverage. Heavy on share-of-voice-style metrics. Can feel overwhelming for first-time users — start with a narrow prompt panel.
- TrackAIMentions
- Prompt-panel-driven, with daily re-runs. Clean UI. Light on prompt-design guidance — the quality of the panel is on you.
- LLMClicks
- Referral-traffic-focused. Bridges server-side attribution with citation tracking. Useful complement to Otterly or TrackAIMentions.
- Frizerly
- Shopify-native. Tighter integration with Shopify admin. Narrower engine coverage than the others; worth for Shopify brands specifically.
- Surfient
- GEO-full-stack. Tracks the five core metrics as the default dashboard; deeper metrics are optional tabs rather than default noise.
5
metrics out of 30+ on typical AI visibility dashboards that actually drive merchant decisions
Surfient client review, 48 merchants using AI visibility tools across 6 vendor platforms, usage telemetry March 2026. Decisions = actions taken in response to metric changes.
The five AI visibility metrics that actually matter
Citation rate, share of voice, engine coverage, prompt coverage, and AI referral traffic. Each defined precisely.
Below are the five metrics, each defined precisely enough to implement consistently. Vendor dashboards use similar names for different things, which is part of the noise — the definitions below are the ones we use on our own reporting and recommend to clients.
1. Citation rate
Definition: the percentage of prompts in a tracked panel where your brand or a specific page is cited in the AI engine's answer. Measurement: run the prompt panel against the engine, parse the answer for citations or named-brand references, count occurrences. Track weekly or monthly; do not over-sample.
2. Share of voice
Definition: your citations divided by total citations across the competitive set for the same prompt panel. If your brand is cited 12 times and 10 other brands are cited a combined 48 times across the same panel, your share of voice is 20%. This contextualises citation rate — 30% citation rate might be strong or weak depending on competitor performance.
3. Engine coverage
Definition: the number of major AI engines that cite you at all. Boolean per engine. Count is 0-6 across ChatGPT, Claude, Perplexity, Gemini, Microsoft Copilot, Google AI Overviews. Engine coverage is a structural metric — it moves slowly and is reset only when you ship meaningful new content or fix plumbing problems.
4. Prompt coverage
Definition: the number of distinct prompts in your tracked panel where you are cited at least once across any engine. If you track 30 prompts and are cited on 19 of them, prompt coverage is 19/30 or 63%. Useful because it reveals whether you dominate a few prompts or have broad presence.
5. AI referral traffic
Definition: sessions on your Shopify store with a referrer matching a known AI engine (chat.openai.com, perplexity.ai, gemini.google.com, etc.) or with traffic attribution tagged to AI sources. The bottom-of-funnel signal that ties citations to actual site visits and ultimately revenue. Track in Shopify Analytics or Plausible with custom dimension setup.
How to build the prompt panel that backs all five metrics
30 prompts per category, mix of short-head and long-tail, mix of branded and generic. Rotated quarterly to keep fresh.
Every metric above is only as good as the prompt panel that feeds it. Building the right panel is the upfront measurement work that most brands get wrong — panels that are too short produce noisy numbers, panels that are too long are expensive to run, and panels skewed too heavily branded or too heavily generic miss half the picture.
- Panel size
- 30 prompts per category. 10 short-head (single category), 10 mid-tail (category + constraint), 10 long-tail (specific question). Balances signal with cost.
- Mix of intents
- About 60% informational (how to, what is, is X worth it), 40% transactional (best, recommended, should I buy). Adjusts per category.
- Branded vs generic
- 20% branded ('is [your brand] worth it'), 80% generic ('best moissanite ring under $1,000'). Branded catches reputation; generic catches discoverability.
- Quarterly rotation
- Replace 20-25% of the panel each quarter to keep it reflecting current shopper questions. Retain the core 15-20 prompts for trend tracking.
Which engines to track and why
Six worth tracking: ChatGPT, Claude, Perplexity, Gemini, Microsoft Copilot, Google AI Overviews. Tracking one or two produces biased pictures.
The AI engine landscape has stabilised into six surfaces that matter for commerce. Tracking fewer produces bias; tracking more (Grok, You.com, smaller engines) adds cost without much incremental insight unless your category has a known affinity for one of them. The six below are the canonical set.
- ChatGPT
- Largest single AI surface. Both the web app and API-embedded uses. Measure across both when possible.
- Claude
- Anthropic's engine. Strong in analysis-heavy contexts. Often overlooked in ecommerce tracking.
- Perplexity
- Answer-first engine. Heavy cited-source presence in every response. Most predictable citation format to measure.
- Gemini
- Google's direct conversational surface. Distinct from AI Overviews; tracks differently.
- Microsoft Copilot
- Browser and Windows-integrated. Reaches different segments of shoppers than the above four.
- Google AI Overviews
- The SERP-embedded AI. Measured differently — citations appear as inline panel items in search results. Worth tracking separately from Gemini.
Cadence and noise management
Weekly cadence produces noise. Monthly cadence is the sweet spot. Compare against the previous three-month rolling average, not week-over-week.
The single most common mistake in AI visibility reporting is over-sampling. A prompt panel run daily produces enough variance (engines re-roll responses, citation decisions are probabilistic) that the week-over-week noise drowns out real signal. Monthly cadence is the sweet spot for most brands; bi-weekly is acceptable for rapidly-changing categories; weekly is usually noise-driven theatre.
- Monthly is the baseline cadence. Enough signal to detect real movement, low enough cost to sustain across 30+ prompts.
- Compare against the trailing three-month rolling average, not against the previous month. Smooths out single-month noise.
- Investigate changes only above a 5 percentage point threshold. Below that, the change is within expected variance and does not warrant action.
- Run a deeper audit quarterly to catch structural shifts (new competitor emerged, engine rewrote its citation format, your content decayed).
“The brands that get AI visibility right do not measure more — they measure the right things more consistently. The five metrics, the thirty-prompt panel, the monthly cadence. That structure beats fifteen metrics tracked daily on a noisy panel every time.”
What to do this month if you have no measurement at all
Build the 30-prompt panel, pick two of the five metrics, run them monthly across three engines. Expand from there once the baseline exists.
If your AI visibility measurement today is 'nothing' or 'an occasional look at a vendor dashboard', the right first move is a focused starter programme. Full coverage across six engines, thirty prompts, five metrics, and monthly cadence is the target steady state — but a minimum viable programme with narrower scope gets you to useful signal in two weeks.
- 1Build a 30-prompt panel for your primary category. Follow the mix described earlier: 10 short-head, 10 mid-tail, 10 long-tail, with about 20% branded.
- 2Pick the two simplest metrics first: citation rate and AI referral traffic. One on the input side, one on the output side.
- 3Run the panel manually or through a vendor against three engines: ChatGPT, Perplexity, and Google AI Overviews. Cover expands to Claude, Gemini, and Copilot once the process is stable.
- 4Report monthly on one page: the two metrics, per engine, with trailing-3-month comparison. No dashboards, no derivative metrics.
- 5Expand the metric set and engine coverage after three monthly cycles. By then the baseline exists and the team has built the habit.