Should I hire a generalist SEO person and upskill them, or hire a GEO specialist directly?

Depends on the generalist. An SEO who has independently tried AI citation tracking in the past year, has opinions on schema freshness, and can articulate how Perplexity differs from Google — yes, upskill. An SEO whose frame is still keyword volume + backlinks — no, hire fresh. The rubric scoring tells you which: run it on your internal candidates. If they score under 60 weighted, upskilling takes 9-12 months minimum and they’ll likely churn once they realise GEO is a different muscle. If they score 65-75, upskilling in 3-4 months with focused exposure is the right bet.

Can we use this rubric for contractor or agency evaluations, not just full-time hires?

Yes, with one adjustment: re-weight Row 5 (Shopify surface area) up to 25 for contract evaluations, and drop Row 2 (Retrieval mental model) to 15. Contractors don’t need deep strategic thinking if they’re executing against your defined brief; they need to actually ship competent Shopify-specific work. For agency-of-record evaluations, keep the original weights and add an extra Row 6 — “Account management discipline” at weight 15, re-weighting the others proportionally. Agencies fail more on cadence and communication than on technical depth.

The rubric weights Schema fluency at 25 — isn’t that too technical for a role that’s half content?

We weighted it highest deliberately because schema is the cheapest-to-test and highest-leverage competency. A content-strong / schema-weak hire will wait for engineering to unblock them on every real fix. A schema-strong / content-medium hire will ship incremental content while simultaneously improving the structural foundation. If you need a pure content marketer, you’re hiring for a different role — call it GEO Content Lead, drop Schema fluency to weight 10, and bump Content intuition to weight 30. But don’t call that role “GEO Specialist” — the general-purpose GEO role needs both.

How do I test Retrieval mental model without having a tech lead on staff yet?

Borrow one. Ask an agency partner, a fractional CTO, or a GEO-fluent founder in your network to run the 30-minute whiteboard round once. Record it if your candidate consents. The signal is whether the candidate can reason out loud about tradeoffs (ranking signals, retrieval pools, eligibility gates) and whether they say “I don’t know” appropriately rather than bluffing. You don’t need the interviewer to know the right answer — you need them to detect bluffing. A generalist technical reviewer works fine here.

What’s the typical salary range for a GEO specialist hitting 75+ weighted on this rubric in 2026?

In the US, 75+ weighted GEO specialists are commanding $105-145K base for IC roles at Shopify brands with $5-20M revenue, plus variable comp tied to citation share and attributed pipeline. At $20M+ brands, GEO leads move into $150-195K base territory. Agencies pay less base ($85-115K) but more variable. Remote-friendly roles trend about 10% below SF/NYC levels. These numbers moved up 20-25% from 2025 as the role became recognised — expect another 10-15% inflation in 2026 as demand keeps outpacing supply. Budget accordingly and don’t try to hire at SEO-specialist compensation levels.

The GEO hiring rubric: 5 competencies

“We need to hire a GEO person” is the sentence we hear most often from Shopify brands who have decided AI indexing is no longer optional. What comes next is usually a six-month detour: a generalist SEO hire who knows schema but has never reasoned about retrieval, or a strong content marketer who writes well but can’t debug a Product JSON-LD block. Here’s the five-row weighted rubric we use internally at Surfient and with partner agencies — calibrated against actual on-the-job performance over 18 months of GEO hiring.

Why GEO hires need a rubric, not vibes

The GEO-shaped role is new enough that there’s no cached hiring intuition in the industry. Interviewers who have hired 200 SEO specialists over a decade are still learning which of those signals transfer and which don’t. The ones that transfer: measurement discipline, content intuition, debugging persistence. The ones that don’t: backlink strategy, keyword volume chasing, crawl-budget optimisation theatrics. Without a rubric, interview teams default to “they seemed smart” and end up hiring a senior SEO who spends their first quarter auditing h1 tags while your AI Overviews presence silently decays.

A weighted rubric forces you to define what “good at GEO” actually means before you see the candidates, then holds you to it during debrief. The version below is our current best answer. It’s not the only valid shape — different brand shapes might re-weight the rows — but the discipline of defining the weights before you interview is the point.

Five-card rubric diagram. Card 01 Schema fluency weight 25, test is live results debug. Card 02 Retrieval mental model weight 20, test is whiteboard explain. Card 03 Measurement discipline weight 20, test is chart critique. Card 04 Content intuition weight 20, test is rewrite exercise. Card 05 Shopify surface area weight 15, test is 15-minute audit. Below, four score bands: 80+ strong hire, 70-79 hire with coaching, 60-69 second panel, under 60 no hire. Bottom callout explains why pre-committing weights matters. — The five-competency rubric: Schema fluency (weight 25), Retrieval mental model (20), Measurement discipline (20), Content intuition (20), Shopify surface area (15). Bottom band shows hire/no-hire score thresholds on a weighted 0-100 scale.

Row 1 — Schema fluency (weight 25)

The heaviest weight, because schema is where the day-to-day work lives. A strong GEO hire can open a live Shopify product page, view-source, read the application/ld+json blocks, and immediately tell you whether the Product node has a valid Offer, whether there are competing nodes from conflicting apps, and whether MerchantReturnPolicy is present. This is a muscle, not a memorisation test — either they’ve debugged it 50 times or they haven’t.

Test for it with a live exercise. Share your screen, pull up a real Shopify product page (not yours; something public-domain), and ask them to find three schema defects. They don’t need to find exactly the three you picked — three legitimate defects of any shape is a pass. What you’re watching for: do they reach for view-source, do they know what a valid Offer needs, do they spot double-nesting, do they distinguish a missing field from an invalid field. Candidates who can’t do this in 45 minutes will grind on your first real project for weeks.

Row 2 — Retrieval mental model (weight 20)

GEO is retrieval strategy wearing content’s clothes. The hires who plateau at “junior specialist” are the ones who never build a mental model of why Perplexity picks four sources out of 1,200, why ChatGPT weights OpenAI-indexed pages differently from Bing crawled ones, or why Google’s AI Overviews will pull Reddit over a brand site on subjective queries. You don’t need ML papers to hire well here; you need candidates who can reason out loud about tradeoffs and update when shown data.

Test for it with a whiteboarding conversation. Pose: “User asks Perplexity ‘best wool rug 8x10 under $800’ — draw how Perplexity picks the four sources it cites.” Good candidates will sketch: query parsing, retrieval pool, eligibility gates, ranker signals, product card shortlist, citation synthesis. Weak candidates will say “it uses AI.” The vocabulary is a tell, but so is willingness to say “I don’t know this part but here’s how I’d find out.”

Row 3 — Measurement discipline (weight 20)

GEO generates enormous quantities of noisy data. Citation appearances bounce day-to-day on retrainer schedules you can’t see. Revenue attribution crosses surfaces, engines, and user sessions. Candidates without measurement discipline fall into one of two traps: they check citation tools daily and panic-swerve strategy, or they report hockey-stick numbers from cherry-picked queries. You want the ones who set a monthly cadence, use query cohorts, and will tell you “this isn’t working, we need to stop.”

Test for it with a chart critique. Show a real-looking citation share graph (any of our research posts have usable ones) and ask: “what’s wrong with reading this chart the way it’s currently presented?” Pass signals: asks about cohort size, asks about sampling cadence, questions the denominator, wants to know what else changed that week. Fail signals: takes the chart at face value and starts planning tactics from it.

Row 4 — Content intuition (weight 20)

GEO content isn’t traditional SEO content. The sentences need to be quotable by a generator under a 400-token budget. The structure needs to be scannable because the retrieval pass is reading 10-20 documents in parallel. Honest comparisons to competitors outperform puffery because AI systems detect and down-weight unsupported brand claims. Candidates whose content portfolio is flowy brand narratives will struggle; candidates whose portfolio includes technical how-tos, honest product comparisons, or well-researched thought leadership will thrive.

Test for it with an async rewrite. Send them a weak PDP paragraph (use a real one from a brand in your category, anonymised) and ask: “rewrite this for both Perplexity quotability and ChatGPT citation, 48 hour window.” Pass: they produce extractable claims, honest specs, a clear quotable sentence, and ideally flag the schema implications. Fail: they swap adjectives and add a CTA.

Row 5 — Shopify surface area (weight 15)

The lowest weight because it’s the most coachable of the five. A strong candidate with zero Shopify experience but deep GEO instincts will pick up Liquid, metafields, app injection patterns, and theme vs Hydrogen distinctions in a month. A Shopify expert who can’t do the other four rows will grind forever. But we don’t weight it to zero, because onboarding speed matters, and candidates who have shipped on Shopify before add velocity in quarter one.

Test for it with a 15-minute audit exercise. Ask them to walk you through one of your own product pages, pointing out Shopify-specific structural choices they see (is this Dawn? which review app is injecting schema? is this using metafields or hardcoded attributes?). Pass: spots at least three Shopify-specific patterns. Fail: treats it as a generic e-commerce page.

The five-stage interview loop

The rubric only works if the interview loop actually exercises each row. A common failure: teams build a beautiful rubric and then run a loop of three conversational rounds that probe only culture fit. Here’s the loop we run, mapped directly to rubric rows.

Five-stage interview loop diagram. Stage 01 screen is 15 min with recruiter, pass gate is naming two or more AI engines. Stage 02 live audit is 45 min with hiring manager, covers schema fluency and Shopify surface area, pass gate is finding 2 of 3 schema defects. Stage 03 retrieval whiteboard is 30 min with tech lead, covers retrieval mental model, pass gate is using correct vocabulary. Stage 04 is async content rewrite with 48-hour window, covers content intuition, pass gate is extractable claim plus schema awareness. Stage 05 is three reference calls, covers measurement discipline, pass gate is hearing a concrete kill-story. Bottom bands detail total time cost and anti-patterns the team has stopped running. — Five-stage interview loop: recruiter screen (15 min), live audit (45 min with hiring manager), retrieval whiteboard (30 min with tech lead), content rewrite (48-hour async), and reference calls (3). Total candidate time: 2.5 hours live + async window. Panel time per candidate: ~2h20m.

Calibration — run the rubric on yourself first

Before you interview a single external candidate, run the rubric on your internal team. Ask each rater to score the current best GEO practitioner you work with (could be a teammate, a freelancer, or an agency partner) across all five rows. Discuss variances. If your ratings across the panel cluster within ~10 points, you’re calibrated. If they spread by 30+ points, your rubric definitions aren’t concrete enough yet — sharpen them before you turn the rubric on candidates. Unclear definitions always favour the candidate the loudest interviewer already likes.

Fix your weights before you see your first candidate. The weights above (25/20/20/20/15) work for most Shopify brands mid-funnel. If you’re pre-product and hiring for GEO leadership, bump Retrieval mental model to 25 and Shopify surface area to 10.
Pre-write the pass gates, not just the rubric rows. “Passes schema fluency” is not a pass gate. “Finds 2 of 3 seeded defects in 45 minutes and explains the fix” is a pass gate.
Debrief scoring before discussing vibes. In the hiring debrief, each interviewer submits their rubric scores before anyone says “I liked them” or “I didn’t.” Vibes contaminate scores if shared first.
Re-score after 90 days of on-the-job performance. Every hired candidate gets rescored at 90 days using the same rubric. This calibrates your interview scoring over time — if candidates you rated 75 at hiring consistently score 60 at 90 days, your interview process is over-rating.
Never re-open a “no-hire” from the same panel. Yes, even if a founder really liked them. Weighted scores below 60 don’t become hires on appeal — that’s how you end up with an expensive mis-hire that everyone saw coming.

What role shape should you hire for?

One last question the rubric implicitly settles: should you hire an individual contributor GEO specialist, a head-of-GEO manager, or a team? For most Shopify brands under $20M revenue, the answer is one IC-level GEO specialist who owns the outcome end-to-end, sitting between content and engineering. That person should score 75+ weighted, with schema fluency and measurement discipline as non-negotiable 80+ rows. Above $20M, split into a GEO content lead + a GEO engineering lead, each of whom can score 80+ on their home row even if weaker on the other. Below that revenue threshold, don’t split — the coordination tax eats your returns.

Tags:hiringgeoteamprocessrubric

The GEO hiring rubric: 5 competencies

Why GEO hires need a rubric, not vibes

Row 1 — Schema fluency (weight 25)

Row 2 — Retrieval mental model (weight 20)

Row 3 — Measurement discipline (weight 20)

Row 4 — Content intuition (weight 20)

Row 5 — Shopify surface area (weight 15)

The five-stage interview loop

Calibration — run the rubric on yourself first

What role shape should you hire for?

Frequently asked questions

See how your Shopify store scores with AI engines

Sources & further reading

What agencies miss about GEO

GEO budget allocation framework

AI citations weekly measurement playbook

Related reading

The prompt library that replaces your keyword list

How to read your GEO score: 4 sub-score patterns

“GEO audits” are 90% theatre

The GEO hiring rubric: 5 competencies

Why GEO hires need a rubric, not vibes

Row 1 — Schema fluency (weight 25)

Row 2 — Retrieval mental model (weight 20)

Row 3 — Measurement discipline (weight 20)

Row 4 — Content intuition (weight 20)

Row 5 — Shopify surface area (weight 15)

The five-stage interview loop

Calibration — run the rubric on yourself first

What role shape should you hire for?

Frequently asked questions

See how your Shopify store scores with AI engines

Sources & further reading

Keep reading

What agencies miss about GEO

GEO budget allocation framework

AI citations weekly measurement playbook

Related reading

The prompt library that replaces your keyword list

How to read your GEO score: 4 sub-score patterns

“GEO audits” are 90% theatre