Key takeaways
- A real GEO deliverable is a point-in-time scope of work in five named parts, not a monthly dashboard and not a single visibility score.
- The move that turns a screenshot into a deliverable: separating “your page is cited but your brand is not named” (a content fix) from “you are invisible” (an off-site fix). A blended score hides which lever to pull.
- In a June 2026 survey, 88% of agencies said they offer GEO services and 37% admitted those services are “loosely defined.” A defined deliverable is the differentiator.
- This post gives you the five components, what goes in each, how to produce them, and a copy-paste scope you can drop into a statement of work.
There is a question that shows up over and over from agencies who have just started selling GEO: “Why do the AI SEO tools show great results, but my client opens ChatGPT and still sees nothing?” It is the most useful question in this whole category, because the answer is also the blueprint for what a GEO deliverable should actually contain.
Here is the short version. Most tools hand back one blended visibility score, calculated from prompts the tool invented, averaged across engines. The number looks healthy while the client is genuinely invisible for the queries that matter. If you ship that number as your deliverable, you have shipped a vanity metric, and the gap between your report and your client’s lived experience is where trust dies. (If the client is invisible and confused about why, our explainer on why a brand is not appearing in AI search is a good thing to send them.)
This is the post-sale guide. You have sold the engagement (if you have not yet, start with how to sell GEO audits and how to pitch GEO services). The pitch posts get you the signature. This one is what goes in the box once you have it: a deliverable framework you can adopt today, tool-agnostic, with the honest limits spelled out so you never over-promise.
Your client signed. Now what goes in the box?
The market is moving faster than most agency offerings. Conductor found that 94% of enterprises plan to increase their AEO and GEO investment in 2026. The demand is real, the budgets are opening, and clients are asking their existing agencies to handle it.
The problem is on the supply side. According to a June 2026 study from GNW Consulting and Demand Metric, 88% of agencies now say they offer GEO services, but 37% describe those services as “loosely defined.” That is the whole opportunity in one stat. The clients want it, most agencies are winging it, and a defined, repeatable deliverable is what separates you from the shop screenshotting a dashboard.
A credible GEO deliverable has five parts. Each answers a specific client question, each produces a tangible artifact, and each can be done by hand or accelerated with a tool. Here is the whole thing on one page before we go deep on each part.
| Component | The client question it answers | The artifact |
|---|---|---|
| 1. Baseline AI-visibility audit | Where am I visible, where are competitors, where am I invisible? | Per-engine, per-zone visibility matrix |
| 2. Kingmaker source-gap list | Which third-party pages does AI cite to recommend competitors? | Ranked off-site target list |
| 3. Prioritized fix plan | What do we do, on which page, in what order? | P1 to P5 fix cards with SMART targets |
| 4. Dollarized opportunity | What is this invisibility costing us? | Revenue-at-risk and recoverable-revenue model |
| 5. White-label report and cadence | Can I hand this to stakeholders, and when do we recheck? | Branded report plus an agreed re-audit date |
One thing the table cannot show is the method underneath all five parts. Skip that method and the components are built on sand, so we start there.
Why the usual GEO deliverable fails
Walk through how most GEO “audits” get made today and the failure is obvious. The agency buys a monitoring subscription, types in the client and a few competitors, lets the tool generate a prompt set, and exports the dashboard. That becomes the deliverable.
Three things break in that workflow:
- The prompts are invented. Tool-generated prompts are guesses at how people ask. They miss the actual phrasings your client’s buyers use, so the score measures the wrong questions. Garbage in, dashboard out.
- The sample is too small to be stable. A lot of agencies test 30 to 50 prompts on ChatGPT alone. AI answers are non-deterministic, so the same prompt run ten times returns ten slightly different sets of cited brands. A handful of prompts on one engine is noise dressed up as a finding.
- One blended number hides the fix. A single score averages away the two failure modes that need opposite remedies. The client is left staring at a low number with no idea what to do about it.
A deliverable worth paying for fixes all three: it uses real buyer prompts, runs enough of them across every engine to be stable, and reports in a way that points at the specific lever to pull. That is the method.
The method before the artifact
Four rules sit underneath every component below. Teach them to your client too, because they are also why your deliverable is more credible than the last vendor’s.
1. Use real buyer prompts, not auto-generated ones. The free, honest baseline is Google Search Console. Pull the queries already getting impressions, sort by low click-through rate (the informational and comparison phrasings overlap most with how people ask AI), and use those as your test set. If the client is a brand-new brand with an empty Search Console, pivot to entity completeness and category-roundup placement instead.
2. Run a panel of prompts across all four engines. Because answers are non-deterministic, one check is a vanity metric. The floor is hundreds of prompt variations across ChatGPT, Perplexity, Gemini, and Copilot, not a quick spin on one. The point of volume is not thoroughness for its own sake; it is the only way to tell “reliably invisible” apart from “unlucky once.”
3. Report per engine and per intent, never as one score. Brand, category, comparison, problem, and feature prompts behave differently and need different fixes. The engines diverge for a mechanical reason: Google’s AI Overviews are grounded in Google’s index, so classic SEO carries over, while ChatGPT and Claude lean harder on training-data weight, and Perplexity leans on live retrieval that favors Reddit and forums. Same brand, same query, a different fix per engine.
4. Classify every result into one of four citation states. This is the move that finds the lever:
- Fully visible. AI links your page as a source and names your brand in the answer.
- Cited, not mentioned. Your URL is in the source list, but your brand is not named in the text the user reads. This is a content and structure fix: the page is trusted enough to be pulled, but not written to be quoted. Sourced is not the same as presented.
- Mentioned, not cited. AI names you without linking your page. Brand recall without a source slot.
- Invisible. Neither your domain nor your brand appears. This is an off-site fix: you are not on the third-party pages the engine trusts for this question.
Here is the insight to build the whole deliverable around: a single blended score collapses “cited, not mentioned” and “invisible” into the same low number, even though one is fixed by rewriting a page and the other is fixed by earning a placement somewhere else entirely. Telling those two apart is the entire value of the deliverable. It is the line between a real scope of work and a screenshot.
Component 1: the baseline AI-visibility audit
The client question it answers: right now, where do AI engines show me, where do they show my competitors, and where am I invisible?
What is in it: a matrix with intent zones down the side and engines plus segments (cities for a local business, buyer personas for everyone else) across the top. Every cell carries one of the four citation states, and every cell drills down to the individual prompts and the full AI responses behind it, so the client can read the actual answer that did or did not name them. Per-engine columns, so “great on Gemini, invisible on ChatGPT” is visible instead of blended away. Not a single 0 to 100 number.
How to produce it by hand: build a real-prompt set from those low-CTR Search Console queries, run it across the four engines, rerun each prompt enough times to separate reliable patterns from luck, log every result’s citation state in a spreadsheet, and split it by city or persona. It is tedious but completely doable for one brand.
Where a tool helps: this is BlueJar’s Visibility Matrix, a zone by city or persona grid with cells colored owned, partial, cited, or lost, click-through to the prompts and full responses, all four engines run in production, 400+ prompts built from a deterministic intent matrix. The point either way is the same: a readable map, not a verdict. For the on-page side of what an audit inspects, our GEO audit checklist is a useful companion.
Component 2: the kingmaker source-gap list
The client question it answers: which third-party pages does AI keep citing to recommend my competitors, and which of them am I missing from?
What is in it: a ranked list of the third-party domains and pages the engines repeatedly cite for the client’s category questions (G2, Capterra, Reddit threads, editorial roundups, niche directories, comparison posts), with the client’s own domain flagged where it appears and conspicuously absent where it does not. This becomes the off-site target list, the parallel workstream to all the on-page work. The line that makes a client sit up: “AI is citing this specific page to recommend your competitor, and you are not on it.”
How to produce it by hand: for each category prompt, record the source URLs the engine actually pulled. Tally frequency across prompts and engines to surface the five to fifteen repeat sources per category. Mark which already include your client, and classify each target by placement type (editorial roundup, directory, comparison page, community thread) so the fix plan can act on it.
Where a tool helps: BlueJar surfaces this as Kingmaker Sources, the ranked third-party domains per funnel stage with the brand’s own domain highlighted. This tends to be the single most actionable thing in the whole deliverable, because it converts “be more visible” into a concrete list of pages to go get on. One honesty note for your report: do not present “Reddit is 40% of AI citations” or “one platform is 25% of your visibility” as hard facts. The platform split is real in direction, but those exact percentages are illustrative, so frame them as such.
Component 3: the prioritized fix plan with SMART targets
The client question it answers: what exactly do we do, on which page, in what order, and how will we know it worked?
What is in it: fix cards prioritized P1 through P5, each with a typed action (a new page, an update to an existing one, an off-site placement, or a critical technical fix), a SMART target (specific and measurable, like “appear in ChatGPT and Perplexity answers for these six comparison prompts within 60 days”), and competitor source mapping that names which competitor is winning the prompt and through which kingmaker source. Every fix ties back to a citation state from Component 1: “cited, not mentioned” cells become content rewrites, “invisible” cells become off-site placements from Component 2’s list. Group them into 30, 60, and 90-day milestones.
How to produce it by hand: for each gap, decide whether it is a structure fix (the page is cited but the brand is not named, so rewrite it to answer the question and name the brand in a quotable way) or an off-site fix (invisible, so go earn the placement). Rank by expected lift against effort, then write a measurable target and a deadline for each.
Where a tool helps: BlueJar’s fix plan emits the P1 to P5 cards with typed actions, SMART targets, and competitor source mapping, and an AI Fix Assistant explains each issue and can suggest rewrites. One thing to be clear about with your client, though: the assistant suggests and explains, it does not write and publish content to the site for you. BlueJar is not a CMS. The deliverable is a plan your team executes, which is exactly what keeps you in the value chain as the agency.
Component 4: the dollarized opportunity
The client question it answers: what is this AI invisibility actually costing us, in dollars the CFO will care about?
What is in it: a revenue model that turns the visibility gap into money. Pick the business model (local service, auto dealer, e-commerce, SaaS), feed in per-offering volume and average order or customer value, and show recoverable versus still-lost revenue alongside a simple do-nothing-versus-act comparison. This is the slide that gets the work funded internally and the engagement renewed, because it reframes “you are invisible in AI” as a number on a board deck.
How to produce it by hand: size the addressable demand for the gap queries (prompt or search volume times conversion times value), estimate the share you can realistically recover by closing the Component 3 fixes, and present it as a range with your assumptions stated. Be conservative. This number is the one your client will repeat to their CFO, so it has to survive scrutiny.
Where a tool helps: BlueJar’s Lost Opportunity Calculator auto-selects the model, takes per-offering volume and value, and outputs recoverable versus still-lost revenue with an ROI view. Frame it honestly as a sizing and scenario model, not a guarantee, and never as tracked revenue. There is no clickstream from an AI answer: if a buyer reads your client’s brand in a ChatGPT response and never clicks, no analytics tool records it. This is a brand-recall play, so the dollar figure is opportunity at risk, not measured AI-driven revenue. Saying that out loud is a trust signal, not a weakness.
Component 5: the white-label report and re-audit cadence
The client question it answers: can I hand this to my stakeholders under my own brand, and when do we check again?
What is in it: the four components above, assembled into a stakeholder-grade report you ship under your own brand, not the tool’s. A clean document (HTML and a Word export work well) with an executive summary up top: the headline state, the biggest gap, and the first move. Then the matrix, the source-gap list, the fix plan, and the opportunity model. Finally, an agreed re-audit cadence written into the statement of work, for example a re-run in 90 days at a natural project boundary.
How to produce it by hand: assemble the components into a branded deck or doc, write the executive summary, and put a specific re-audit date in the contract.
Where a tool helps: BlueJar’s Client-Ready Proposal is a multi-tab report plus a Word (.docx) export, white-labelable at the Advanced tier, with a per-fix mechanisms view and an editable investment plan of roles, phases, and milestones. The one thing to be precise about: this deliverable is point-in-time. The re-audit cadence is something you and the client agree to and you re-run on schedule, not a 24/7 monitor watching citations live. Continuous monitoring is a different product. And honestly, the cadence framing is the better business anyway. A defined engagement re-run quarterly beats a flat monthly dashboard the client churns on in month four, because there is no month four to get bored in, just the next scheduled audit with fresh findings.
Put it in your statement of work
Here is the five-part scope, written so you can paste it into a proposal and adapt the specifics:
- 1. Baseline AI-visibility audit. A point-in-time map of the brand’s visibility across ChatGPT, Perplexity, Gemini, and Copilot, built from real buyer queries, reported per engine and per intent zone, with every result classified by citation state.
- 2. Kingmaker source-gap list. A ranked list of the third-party pages AI cites to recommend competitors, flagged for where the brand is missing, classified by placement type.
- 3. Prioritized fix plan. P1 to P5 fixes with typed actions, SMART targets, and 30, 60, and 90-day milestones, each tied to a specific gap and lever.
- 4. Dollarized opportunity. A conservative revenue-at-risk and recoverable-revenue model with stated assumptions.
- 5. White-label report and re-audit. A branded stakeholder report delivered under the agency’s name, plus an agreed re-audit date.
Two boundaries to write in explicitly, because they protect you: the deliverable is point-in-time (re-audits happen on the agreed cadence, not continuously), and the opportunity figure is a model of revenue at risk, not tracked AI-driven sales. Clients respect the precision, and it stops scope creep before it starts.
That is the whole offering. If you are still deciding whether to add this service line at all, our guide on transitioning from an SEO agency to a GEO agency covers the business case. And resist the urge to lead the client with a single GEO score: as we cover in what a GEO score really measures, the per-zone, per-engine breakdown is what tells them which lever to pull.
Want to produce all five components without building the spreadsheet yourself? Run a free BlueJar analysis on a client site and see the Visibility Matrix, Kingmaker Sources, fix plan, and a white-label proposal you can ship under your own brand.
Frequently asked questions
What should a GEO deliverable include?
A complete GEO deliverable has five parts: a baseline AI-visibility audit across engines, a ranked list of the third-party “kingmaker” sources AI cites to recommend competitors, a prioritized fix plan with SMART targets, a dollarized opportunity model, and a white-label report with an agreed re-audit cadence. The thread that ties them together is classifying every result by citation state so the client knows which fix to make.
Why do AI SEO tools show good scores but the client sees nothing in ChatGPT?
Because most tools report a single blended score calculated from auto-generated prompts averaged across engines. That hides the engine and the intent zone where the client is actually invisible. The fix is to use real buyer prompts, report per engine and per intent, and classify results so “cited but not named” is never blended together with “completely invisible.”
How is this different from selling a GEO audit?
Selling is the conversation that wins the deal, which we cover in our agency pitch guide. This is the deliverable you ship after the contract is signed: the actual scope, the components, and how to produce each one. The pitch gets the signature, the deliverable is what goes in the box.
Do I need a tool to deliver GEO, or can I do it manually?
You can produce all five components by hand for a single brand using Google Search Console for real prompts, manual testing across the four engines, and a spreadsheet to log citation states. A tool like BlueJar mainly saves time and adds scale (hundreds of prompts, multi-city distribution, a white-label export). The framework is the same either way.
How often should a GEO audit be re-run?
A quarterly cadence works for most clients, re-run at a natural project boundary so each audit surfaces fresh findings. The deliverable is point-in-time, so re-auditing on an agreed schedule is what keeps the engagement alive, rather than a live monitor running between audits. Write the cadence into the statement of work.
What should I not promise a client when delivering GEO?
Do not promise continuous real-time monitoring if your deliverable is a point-in-time audit, do not present the dollar opportunity as tracked AI-driven revenue (there is no clickstream from AI answers to attribute), and do not imply a tool will auto-write and publish the fixes. Naming these limits up front builds trust and prevents the scope creep that kills agency margins.