“We need to hire a GEO person” is the sentence we hear most often from Shopify brands who have decided AI indexing is no longer optional. What comes next is usually a six-month detour: a generalist SEO hire who knows schema but has never reasoned about retrieval, or a strong content marketer who writes well but can’t debug a Product JSON-LD block. Here’s the five-row weighted rubric we use internally at Surfient and with partner agencies — calibrated against actual on-the-job performance over 18 months of GEO hiring.
Why GEO hires need a rubric, not vibes
The GEO-shaped role is new enough that there’s no cached hiring intuition in the industry. Interviewers who have hired 200 SEO specialists over a decade are still learning which of those signals transfer and which don’t. The ones that transfer: measurement discipline, content intuition, debugging persistence. The ones that don’t: backlink strategy, keyword volume chasing, crawl-budget optimisation theatrics. Without a rubric, interview teams default to “they seemed smart” and end up hiring a senior SEO who spends their first quarter auditing h1 tags while your AI Overviews presence silently decays.
A weighted rubric forces you to define what “good at GEO” actually means before you see the candidates, then holds you to it during debrief. The version below is our current best answer. It’s not the only valid shape — different brand shapes might re-weight the rows — but the discipline of defining the weights before you interview is the point.

Row 1 — Schema fluency (weight 25)
The heaviest weight, because schema is where the day-to-day work lives. A strong GEO hire can open a live Shopify product page, view-source, read the application/ld+json blocks, and immediately tell you whether the Product node has a valid Offer, whether there are competing nodes from conflicting apps, and whether MerchantReturnPolicy is present. This is a muscle, not a memorisation test — either they’ve debugged it 50 times or they haven’t.
Test for it with a live exercise. Share your screen, pull up a real Shopify product page (not yours; something public-domain), and ask them to find three schema defects. They don’t need to find exactly the three you picked — three legitimate defects of any shape is a pass. What you’re watching for: do they reach for view-source, do they know what a valid Offer needs, do they spot double-nesting, do they distinguish a missing field from an invalid field. Candidates who can’t do this in 45 minutes will grind on your first real project for weeks.
Row 2 — Retrieval mental model (weight 20)
GEO is retrieval strategy wearing content’s clothes. The hires who plateau at “junior specialist” are the ones who never build a mental model of why Perplexity picks four sources out of 1,200, why ChatGPT weights OpenAI-indexed pages differently from Bing crawled ones, or why Google’s AI Overviews will pull Reddit over a brand site on subjective queries. You don’t need ML papers to hire well here; you need candidates who can reason out loud about tradeoffs and update when shown data.
Test for it with a whiteboarding conversation. Pose: “User asks Perplexity ‘best wool rug 8x10 under $800’ — draw how Perplexity picks the four sources it cites.” Good candidates will sketch: query parsing, retrieval pool, eligibility gates, ranker signals, product card shortlist, citation synthesis. Weak candidates will say “it uses AI.” The vocabulary is a tell, but so is willingness to say “I don’t know this part but here’s how I’d find out.”
Row 3 — Measurement discipline (weight 20)
GEO generates enormous quantities of noisy data. Citation appearances bounce day-to-day on retrainer schedules you can’t see. Revenue attribution crosses surfaces, engines, and user sessions. Candidates without measurement discipline fall into one of two traps: they check citation tools daily and panic-swerve strategy, or they report hockey-stick numbers from cherry-picked queries. You want the ones who set a monthly cadence, use query cohorts, and will tell you “this isn’t working, we need to stop.”
Test for it with a chart critique. Show a real-looking citation share graph (any of our research posts have usable ones) and ask: “what’s wrong with reading this chart the way it’s currently presented?” Pass signals: asks about cohort size, asks about sampling cadence, questions the denominator, wants to know what else changed that week. Fail signals: takes the chart at face value and starts planning tactics from it.
Row 4 — Content intuition (weight 20)
GEO content isn’t traditional SEO content. The sentences need to be quotable by a generator under a 400-token budget. The structure needs to be scannable because the retrieval pass is reading 10-20 documents in parallel. Honest comparisons to competitors outperform puffery because AI systems detect and down-weight unsupported brand claims. Candidates whose content portfolio is flowy brand narratives will struggle; candidates whose portfolio includes technical how-tos, honest product comparisons, or well-researched thought leadership will thrive.
Test for it with an async rewrite. Send them a weak PDP paragraph (use a real one from a brand in your category, anonymised) and ask: “rewrite this for both Perplexity quotability and ChatGPT citation, 48 hour window.” Pass: they produce extractable claims, honest specs, a clear quotable sentence, and ideally flag the schema implications. Fail: they swap adjectives and add a CTA.
Row 5 — Shopify surface area (weight 15)
The lowest weight because it’s the most coachable of the five. A strong candidate with zero Shopify experience but deep GEO instincts will pick up Liquid, metafields, app injection patterns, and theme vs Hydrogen distinctions in a month. A Shopify expert who can’t do the other four rows will grind forever. But we don’t weight it to zero, because onboarding speed matters, and candidates who have shipped on Shopify before add velocity in quarter one.
Test for it with a 15-minute audit exercise. Ask them to walk you through one of your own product pages, pointing out Shopify-specific structural choices they see (is this Dawn? which review app is injecting schema? is this using metafields or hardcoded attributes?). Pass: spots at least three Shopify-specific patterns. Fail: treats it as a generic e-commerce page.
The five-stage interview loop
The rubric only works if the interview loop actually exercises each row. A common failure: teams build a beautiful rubric and then run a loop of three conversational rounds that probe only culture fit. Here’s the loop we run, mapped directly to rubric rows.

Calibration — run the rubric on yourself first
Before you interview a single external candidate, run the rubric on your internal team. Ask each rater to score the current best GEO practitioner you work with (could be a teammate, a freelancer, or an agency partner) across all five rows. Discuss variances. If your ratings across the panel cluster within ~10 points, you’re calibrated. If they spread by 30+ points, your rubric definitions aren’t concrete enough yet — sharpen them before you turn the rubric on candidates. Unclear definitions always favour the candidate the loudest interviewer already likes.
- Fix your weights before you see your first candidate. The weights above (25/20/20/20/15) work for most Shopify brands mid-funnel. If you’re pre-product and hiring for GEO leadership, bump Retrieval mental model to 25 and Shopify surface area to 10.
- Pre-write the pass gates, not just the rubric rows. “Passes schema fluency” is not a pass gate. “Finds 2 of 3 seeded defects in 45 minutes and explains the fix” is a pass gate.
- Debrief scoring before discussing vibes. In the hiring debrief, each interviewer submits their rubric scores before anyone says “I liked them” or “I didn’t.” Vibes contaminate scores if shared first.
- Re-score after 90 days of on-the-job performance. Every hired candidate gets rescored at 90 days using the same rubric. This calibrates your interview scoring over time — if candidates you rated 75 at hiring consistently score 60 at 90 days, your interview process is over-rating.
- Never re-open a “no-hire” from the same panel. Yes, even if a founder really liked them. Weighted scores below 60 don’t become hires on appeal — that’s how you end up with an expensive mis-hire that everyone saw coming.
What role shape should you hire for?
One last question the rubric implicitly settles: should you hire an individual contributor GEO specialist, a head-of-GEO manager, or a team? For most Shopify brands under $20M revenue, the answer is one IC-level GEO specialist who owns the outcome end-to-end, sitting between content and engineering. That person should score 75+ weighted, with schema fluency and measurement discipline as non-negotiable 80+ rows. Above $20M, split into a GEO content lead + a GEO engineering lead, each of whom can score 80+ on their home row even if weaker on the other. Below that revenue threshold, don’t split — the coordination tax eats your returns.