GEO without weekly measurement is vibes. The difference between “we think citations are up” and “citation share went from 22 to 27 of 40 this week because we shipped the standing-desks FAQ on Tuesday” is what separates a merchant who compounds from one who guesses. This is the exact weekly stack we run with Shopify merchants — what to measure, how to automate it, and what the Monday-morning Slack digest looks like.
The three layers that make measurement work
A measurement stack that holds up under a budget conversation needs three layers and only three. One synthetic layer (can we force a citation by asking the model directly?), one real-signal layer (what are bots actually fetching, and what are shoppers actually doing?), and one reporting layer (four numbers and a narrative). Skip one and the stack collapses — synthetic alone is theatre, real alone is reactive, reporting without both is vibes-as-a-service.

Layer 1 · The prompt panel
Forty shopper queries, split across four personas, run against four engines every Monday at 09:00 local time. Record whether you're cited, the exact cited URL, citation position, and who's cited instead of you when you aren't. The run takes about four hours end to end (mostly waiting for engines to respond); a growth lead reviews the results in roughly 45 minutes.
The 4 personas, 10 prompts each
- Discovery (10) — 'best X for Y' and 'which X is best for Z'. The widest funnel queries where losing hurts most.
- Compare (10) — 'X vs Y for Z' and 'is brand-X better than brand-Y'. Where brand entity recognition gets tested.
- Buy (10) — 'where to buy X under $N' and 'cheapest X with feature Y'. Highest revenue-intent queries.
- Rescue (10) — 'X broke, how do I fix it' and 'does X work with Y'. Post-purchase queries that drive repeat revenue.
Lock the 40 prompts for at least 90 days. You're measuring a moving target already (engine behaviour, competitor changes, index freshness) — if the prompt set shifts weekly you can't trust any trend line. Rotate prompts only at quarter boundaries.
Layer 2 · Real signal ingest
Synthetic panels tell you “do the models know you.” Real signal tells you “did it matter.” Three streams feed Layer 2 — server logs, citation scrape, and first-party revenue. All three run continuously; the Monday job just rolls them up for the week.
Server logs — the truth source
Grep your access logs for these user-agents and count clean 200 responses: GPTBot, ChatGPT-User, PerplexityBot, ClaudeBot, Amazonbot, Applebot-Extended, CCBot, Google-Extended. Flag any 4xx/5xx — assistants that hit errors stop coming back quickly. Store the aggregate in a small BigQuery table or even a Sheet; the point is longitudinal, not real-time.
Citation scrape — who's cited when you are
The panel should capture not only whether you were cited but also the full list of URLs cited in the answer. This is where the competitive-intelligence layer lives. If Perplexity starts citing a competitor's blog post on every discovery query in your category, that's the signal to write a better one — not two weeks later, this Monday.
First-party revenue — the only number finance cares about
Tag outbound links wherever you can with utm_medium=ai and a source per engine (utm_source=chatgpt, etc.). Most models strip referrers, so this won't catch every session, but it catches enough to sanity-check the direct-traffic proxy. Then tag the Shopify order itself (source_ai tag) when the landing path came from an AI source — revenue attribution at the order level is the scorecard's keystone.
Layer 3 · The four-KPI scorecard
Four numbers. Always the same four. Monday 09:30 in Slack. Eight-week trailing chart on a single page the whole growth team sees. The four:
- Citation share — number of the 40 prompts where your domain was cited by at least one engine. Primary KPI.
- Cited pages — count of distinct URLs cited across the panel in the week. A diversity metric — 13 cited pages is healthier than 27 citations all going to one URL.
- AI-CTR — server-observed clicks to cited pages within 30 minutes of a panel citation, divided by total citations. A retrieval-quality proxy.
- AI revenue — Shopify orders tagged source_ai for the week. The finance-team number.

The Slack digest that actually lands
Keep the format surgical: three lines, one channel, same time every week. The digest is a tool for producing decisions, not a storytelling venue. Anything longer than three lines gets skimmed.
- Wins — the specific prompt(s) we newly cited for, and why (which ship caused it).
- Losses — the specific prompt(s) we regressed on, and where the breakage likely is.
- Next action — exactly one deliverable, one owner, one date. No more.