The pitch arrives cold, usually on LinkedIn: "We'll make ChatGPT recommend your brand to every buyer in your category — guaranteed." Attached is a chart showing a competitor at 31% "AI share of voice," you at 4%, and a retainer at €6,000 a month. The fear is well-aimed. Buyers really have moved into AI assistants, and brands really are invisible there. But the market selling the fix is barely two years old, has no standard metrics, and is growing faster than its own honesty. Generative engine optimization is a real discipline; a measurable share of what is sold under that name is not.
One disclosure first: we sell AI visibility work ourselves, so this is an audit of our own category — read our bias accordingly. It is written to be useful even if you never hire anyone: the physics that makes some promises structurally false, ten red flags in the vendors' own words, fair price anchors, and the questions that sort operators from costumes.
TL;DR
- LLM answers are probabilistic and model-version-dependent. A vendor can raise the odds you appear; nobody can lock a placement. Guarantee language is the loudest red flag in the category.
- Run the 5-Question Sniff Test on the sales call: prompt set, baseline-to-delta proof, what you are actually buying, the model-update plan, and why not a €99 tool instead.
- Fair 2026 anchors: productized audits around €900–1,500, full agency audits up to $7,500, boutique retainers €3,000–8,000 a month — paid only against deliverables named in the SOW.
- "AI share of voice" without a published prompt set, sampling counts, and variance is decoration, not measurement.
- Honest GEO is mostly unglamorous source work — entities, structured data, citation surfaces like Wikipedia — not AI-generated content at volume.
Why this market grew snake oil overnight
Three conditions arrived at once, and each one favors the seller.
No standard metrics. Every vendor computes "AI share of voice" from its own prompt panel, so numbers are neither comparable across vendors nor auditable by you.
An invisible mechanism. Nobody outside the model labs can fully explain why an assistant named one brand and skipped another in a given answer. When the buyer cannot verify the mechanism, sales copy fills the vacuum.
Terrified budgets. Organic traffic is declining, the board is asking what ChatGPT says about the company, and "do nothing" feels riskier than "sign something." Fear compresses due diligence.
The result is gold-rush economics: skeptical coverage notes a GEO startup valued above $100 million before its first birthday (Webbiquity). Some of that is genuine category growth. The rest is what happens when demand outruns the buyer's ability to verify delivery.
The physics: why a guaranteed AI ranking is structurally false
You do not need to take a vendor's word for what is possible. The system itself sets the limits.
Answers are sampled, not retrieved from a ranking. A model generates each answer probabilistically: the same prompt, on the same day, in two clean sessions, can name different brands in a different order. There is no index with slots, so there is no slot anyone can sell you. Source work changes the probability distribution — how often you appear across many askings — never a fixed position.
Model versions reshuffle everything. Each model release changes training data, retrieval behavior, and source weighting. A brand that dominated answers under one version can lose ground under the next, through no action of its own. Even the platforms themselves trade share: G2's buyer research found ChatGPT's share among B2B software buyers who use AI fell from 89% to 63% in a year, while Claude rose from 1.4% to 18.5% (G2 via PRNewswire). "Ranking on AI" is not one scoreboard; it is several, and all of them move.
What legitimate work does is raise the floor under that volatility: more independent sources a model can cite, consistent entity data it can ground against, presence on the surfaces it retrieves from. That raises mention probability measurably and durably. It cannot lock placement. Any vendor promising determinism either misunderstands the system or hopes you do.
The 5-Question Sniff Test
Ask all five on the first call. Each takes a minute, and together they filter most of the market.
- "Which prompt set, which models, how sampled?" A real operator hands you a written prompt list, names model versions, and states runs per prompt. A fake one says "our proprietary tracking covers everything."
- "Show me baseline-to-delta on a past client." Pass: an anonymized before/after on the same frozen prompt set, variance included, with some prompts that did not move. Fail: a logo wall and "+340% AI visibility" with no denominator.
- "What part of this is content, PR, or entity work — what am I actually buying?" GEO is a bundle of existing disciplines aimed at machine-readable surfaces. An honest vendor decomposes the bundle. A dishonest one says the algorithm does the work.
- "What happens to my results when the next GPT ships?" The only honest answer is a version of: answers will reshuffle, we re-baseline, and the durable layer is your sources and entity data. Any flavor of "our results persist across model updates" fails the physics above.
- "Why can't I get the same from a €99 tool plus my content team?" Sometimes you can, and a serious vendor will say so. One who cannot articulate value above measurement is selling you the dashboard at retainer prices.
The 10 red flags, in the vendors' own words
1. The guarantee. "We guarantee ChatGPT will recommend you within 90 days." Probabilistic systems do not issue guarantees; people who want your signature do. This single sentence should end the call.
2. The proprietary algorithm. "Our proprietary AI ranking algorithm has decoded how ChatGPT ranks brands." Nobody outside the labs has decoded model internals, and there is no stable "ranking" to decode. What vendors actually have is a prompt panel and a scraper — useful, but not secret physics.
3. The submission desk. "We submit your brand directly to OpenAI, Google, and Anthropic." No such desk exists. There is no form where a brand gets filed into future answers. This claim is not an exaggeration; it is an invented mechanism.
4. llms.txt as a four-figure line item. "AI crawler configuration file — €1,200." The file is plain markdown, takes about twenty minutes, grants nothing, and no engine treats it as a ranking signal. Shipping one is sensible — we publish our own — but four figures for it is arbitrage on your unfamiliarity.
5. The share-of-voice chart with no methodology. "You're at 4%; your competitor is at 31%." Ask which prompts, how many runs, which models, sampled when. If the pre-sale chart cannot answer, the post-sale reports will not either — the deck was built to alarm, not to measure.
6. Results inside one model cycle. "You'll see movement within 30 days, before your next board meeting." Source changes propagate through crawls, retrieval indexes, and retraining over weeks to months. Anything that "moves" in days is retrieval noise or creative measurement.
7. No baseline before the work starts. "We'll start optimizing immediately and send monthly visibility reports." A vendor who never captures a frozen baseline can never prove a delta — which is convenient for exactly one party in the contract.
8. The SOW that names nothing. "Ongoing generative engine optimization — €6,000/month." If the deliverable line has no nouns — no prompt set, no source list, no entity work, no re-measurement cadence — you are buying a subscription to vibes.
9. Everything, everywhere, one price. "We optimize for all AIs." Each engine leans on a different source mix, and behavior differs by language and market. Promising every engine at once, with no prioritization, means measuring none of them properly.
10. Content volume rebranded as GEO. "30 AI-optimized articles per month." Engines reward citable authority, not throughput; mass-produced AI content is exactly what platforms are learning to discount. Volume also produces nothing another source would ever cite — which is the actual game.
Fair price anchors for 2026
Prices in this market span two orders of magnitude for similar-sounding promises — published GEO retainers run from roughly €200 a month at the freelancer end to $25,000 a month at the enterprise end (Citable). Anchors that match what the work actually costs:
| Engagement | Honest 2026 range | What must be included | The rip-off version |
|---|---|---|---|
| Productized AI-visibility audit | €900–1,500 | Fixed prompt set, multi-model baseline, citation-source map, prioritized fix list | Templated PDF with scores but no prompt list, sold at €3,000+ |
| Agency GEO audit | $1,500–3,000 focused; $5,000–7,500 full (Demand Local) | All of the above plus entity and structured-data review, competitor citation analysis | A relabeled SEO audit — same crawl, new acronym, doubled price |
| Boutique retainer | €3,000–8,000/mo (Citable) | Named monthly deliverables: citation-source building, entity work, re-measurement against baseline | "Ongoing optimization," deliverables unnamed, results unfalsifiable |
| Monitoring tooling | €29–500/mo self-serve | Fixed prompts, scheduled runs, multi-engine coverage | The same tool resold inside a retainer at 10× as "proprietary tracking" |
| Single technical fixes (llms.txt, schema) | Hours of work, bundled into an audit | Implementation plus verification | Four-figure standalone line items for twenty-minute files |
For calibration: we sell fixed-scope packages in this category at €700, €1,500, and €3,500, one-time — listed here not as the pitch but as the disclosure. This table is the standard we expect to be judged by.
What a legitimate retainer names in the SOW
If a monthly engagement is justified at all, the statement of work reads like an engineering document, not a manifesto. Five deliverables should appear by name:
- Prompt-set definition. The frozen list of buyer-relevant prompts — category, comparison, brand, adverse — agreed in writing before any work starts.
- Baseline capture. Multi-model, multi-run measurement of where you stand today, archived so neither party can move the goalposts later.
- Entity and structured-data work. Specific records to be created or corrected — knowledge-graph entries, schema markup, consistent organization data across surfaces.
- Citation-source building. Which independent, citable sources will exist at the end that did not exist at the start. This is the slowest line and the one that matters most.
- Re-measurement cadence. Same prompts, same method, stated interval, variance reported — including the prompts that got worse.
A vendor who resists writing these down is telling you the deliverable is the invoice.
The metrics con: how share-of-voice gets faked
Three moves produce an impressive chart from nothing, and all three are invisible unless you ask.
Cherry-picked prompts. Measure 200 prompts, report the 20 that improved. The fix: the prompt set is frozen in the SOW, and every report covers all of it.
Single-run sampling. One answer per prompt per month is a coin flip presented as a trend. The same prompt can include you at noon and skip you at one. The fix: multiple runs per prompt, with mention rate reported across runs.
No confidence intervals. A move from 22% to 26% on a small prompt panel is statistically nothing, but it renders as a satisfying upward bar. The fix: share-of-voice reports must include run counts and variance — and flag which changes are within noise.
None of this requires a statistics degree to police. It requires asking, once, on the call: "How many runs per prompt, and what is the noise floor?" Silence is an answer.
Where Wikipedia and Wikidata fit honestly
Strip the acronym away and GEO is mostly a sourcing problem: models cite what they trust, and they trust a short list of surfaces. Citation studies keep finding Wikipedia at the top of it — 5W's Citation Source Index measured it at 13.15% of US ChatGPT citations, the largest single source, with Reddit second at 11.97% (5WPR). Wikidata plays the quieter role, feeding the knowledge graphs engines use to ground who you are and which claims about you are canonical. How the encyclopedia layer flows into AI answers is its own discipline — see Wikipedia AEO.
This cuts both ways, and the symmetry is the test. A GEO vendor who never mentions sourcing, citation surfaces, or entity work is selling content spam with a new acronym. And the Wikipedia-services market has its own long-running scam ecosystem — the warning signs there rhyme with this article: guarantees, ghost agencies, unverifiable case studies. Same con, different surface.
Build vs buy: when a tool plus your team wins
A retainer is the wrong purchase if your prompt universe is small (under roughly 100 buyer-relevant queries), you operate in one language, engines already state correct facts about you, and your content team can act on findings. Then a monitoring tool from €29 a month — or the free 20-prompt baseline from our monitoring tools guide — plus your own writers covers most of what a mid-range retainer claims.
External help earns its fee in three situations: multi-market measurement and source work in languages you do not staff; entity and citation-surface work your team cannot do in-house (knowledge-graph corrections, encyclopedic sourcing); and active hallucinations about your company that need fixing at the source. All three are projects with ends — which is why fixed scope usually fits this work better than an open-ended retainer.
The one-page vendor interrogation sheet
Read this down the next sales call. Every "no" is a data point; three are a verdict.
- Written prompt set and named model versions offered without prompting?
- Baseline capture scoped before any optimization work?
- Past-client baseline-to-delta shown on a frozen prompt set?
- Runs-per-prompt and variance included in reporting?
- SOW names entity work and citation sources, not "ongoing optimization"?
- Plan stated for the next model release?
- Zero guarantee language anywhere in the proposal?
- Price within the anchor table above for the engagement type?
- Can they explain what you could do in-house instead?
- Will they put the refund or exit terms in writing?
Download: PDFthe vendor scorecard (PDF) — we built it for Wikipedia vendors, but the rows are vendor-agnostic: swap "live article URLs" for "prompt-set methodology" and it scores GEO vendors side by side just as well.
If you would rather skip the retainer question entirely, that is deliberately how we sell: AI Visibility packages are fixed-scope and one-time — Starter €700, Standard €1,500, Enterprise €3,500 — every deliverable named before you pay, outcomes framed as measurable probabilities, because that is all anyone honest can sell in this category. Bring the scorecard to our call too.