AEO Visibility Challenge — Week 1: 0% Visible. Here Is What Honest Day One Looks Like

Apr 8, 2026

AEO Visibility Challenge — Week 1: 0% Visible. Here Is What Honest Day One Looks Like

Webappski just pivoted into Answer Engine Optimization. Before claiming we can make other brands visible to AI, we measured ourselves. The result: zero mentions across nine API calls and six manual platform checks. This is Week 1 of a public series tracking how an AEO agency builds AI visibility from absolute scratch.

Webappski is an AEO agency. In Week 1 of our public AI visibility challenge, we tested our brand across ChatGPT, Gemini, Claude, Perplexity, and Microsoft Copilot — three unbranded queries each, fifteen checks total. The result was zero mentions. Not one AI engine knows we exist yet. This is what honest Day One looks like for an agency that just pivoted into Answer Engine Optimization.

On April 7, 2026, our team ran the first measurement of webappski.com against the AI search ecosystem. We did it the same way we do it for clients — direct API calls to OpenAI, Google, and Anthropic, plus manual checks in Perplexity and Microsoft Copilot. We picked three queries that match the verticals we serve: a commercial agency search, an informational how-to, and a vertical-specific search.

We did not stop there. We also ran webappski through two of the most popular third-party AEO measurement tools: HubSpot AEO Grader and Ahrefs Free AI Visibility. What we learned about those tools is, in some ways, more important than what we learned about ourselves.

The Three Queries

Our test queries were chosen to match real search intent across three Webappski verticals: commercial agency hire, how-to authority, and SaaS vertical specialization. None of them mention our brand name. That is intentional. A branded query proves nothing — anyone gets cited when their own name is the search term.

Q1 (commercial intent): best answer engine optimization agencies 2026
Q2 (informational intent): how to make my website visible in ChatGPT and Perplexity
Q3 (vertical intent): AEO services for B2B SaaS companies

The Direct API Results

We ran each query through three AI engines using their official APIs: OpenAI (gpt-4o-search-preview), Google Gemini (gemini-2.0-flash with grounding), and Anthropic Claude (claude-sonnet-4-6 with web search). Nine API calls in total. The result was the same on every single one: webappski.com was not mentioned in the answer text, was not in the cited sources, and did not appear anywhere in any of the responses.

Instead, we saw the names of the agencies that have already captured these queries. First Page Sage appeared as the top recommendation in two engines (Claude and Gemini) for the commercial query. NoGood appeared in two engines and is notable for having built its own AEO platform called Goodie. iPullRank appeared in two engines as well. The remaining agencies named across our nine API responses included Omnius, House of Growth, Avenue Z, Minuttia, Searchtides, Amsive, LSEO, WebFX, Ignite Visibility, Victorious, and SmartSites.

For the SaaS vertical query, the agencies named were different — Omnius (London, SaaS-focused), XEO.works, Online Optimism, plus citations to five SaaS-AEO listicles on Discovered Labs, Team4 Agency, ABM Agency, Maximus Labs, and the Omnius blog. Webappski was absent from all of them. That absence is the most actionable finding of Week 1: there are five existing lists where being added would produce immediate measurable visibility.

The Manual Platform Checks

Direct API access is not available for every AI assistant our prospective clients use. Perplexity reserves API access for Pro subscribers, and Microsoft Copilot has no public consumer API at all. So our team checked both manually — opening each platform in an incognito window, running the same three queries, and reading the answers carefully.

Perplexity returned zero mentions across all three queries. Microsoft Copilot returned zero mentions across all three queries. That brought the manual count to zero out of six. Combined with the nine API checks, the final tally for Week 1 was zero out of fifteen — a flat 0%.

Then We Tested the Trackers

Once we had the hard data from the AI engines themselves, we ran webappski through two popular third-party AEO measurement tools to see whether they would agree with reality. They did not. And the way they disagreed turned out to be the most important learning of the week.

HubSpot AEO Grader

HubSpot returned an overall AEO score of 28/100 for OpenAI, 34/100 for Perplexity, and 44/100 for Gemini. The Gemini score even came with a green status: 'You are on the right track.' Brand sentiment was rated 19/40, 18/40, and 26/40. Share of Voice was reported as 7/10 for Gemini.

These numbers do not match what we just observed directly. Five independent sources — three direct API tests and two manual platform checks — all returned zero mentions. So why does HubSpot show webappski as halfway up the AEO scale, with Gemini marked as 'on the right track'? Three factors likely explain the gap.

The grader uses a narrow geographic and category filter. The result URL contained ?geography=Europe&productsServices=services&industry=AEO. Inside that narrow niche, anyone with technical AEO basics — llms.txt, Schema.org, services pages — earns a relatively high score. The grader is not measuring AI mentions. It is measuring whether the website is technically prepared to be measured.
It is part of a broader product ecosystem. The AEO Grader is a free entry-point tool that lives alongside HubSpot's commercial offerings, and that context shapes its scoring incentives. A score that says 'you have room to improve' is a more engaging conversation starter than either 'you are at zero' or 'you are perfect' would be — and that mid-range bias likely explains why our numbers landed comfortably above zero rather than at it.
Brand Recognition is the metric closest to the truth. Across all three engines, HubSpot scored Webappski's Brand Recognition at 1/20. That single metric matches what we observed directly: the AI engines barely know we exist. The other metrics weight differently and end up softening the overall picture.

Ahrefs Free AI Visibility

Ahrefs gave us the opposite answer: 'No AI mentions found for webappski.' No score. No breakdown. Just a flat zero, with a prompt to upgrade to Brand Radar for more detail.

On its face, Ahrefs matched our hard data — five sources said zero, and Ahrefs said zero. But our team has tested Ahrefs Free on other brands that are clearly mentioned in their own categories. The free tier produced false negatives there as well. So the honest conclusion is not that one tracker is right and the other is wrong. The honest conclusion is that both free third-party trackers should be treated as imprecise indicators, not as measurement standards: HubSpot tends to score brands above their actual visibility, Ahrefs Free tends to score them below it. Neither is safe to use as a single source of truth.

What We Learned This Week

The most important finding from Week 1 is not the score itself. It is the contradiction between the measurement tools. HubSpot said 28-44 out of 100. Ahrefs said zero. Reality, measured five ways, said zero. Two paid third-party tools, two completely different stories, and neither matched what the AI engines actually return when our prospective clients ask them.

Webappski just pivoted into AEO. Our technical foundation is in place — llms.txt, Schema.org structured data, services pages built around answer-first content. Our content pipeline is written but not yet published. That means our baseline is not 'we have everything but it does not work.' Our baseline is literally Day One. And that is the entire point of running this challenge in public: showing exactly how an AEO agency builds authority from absolute zero, without shortcuts.

The most interesting competitor observation is First Page Sage. They dominated Q1 in both Claude and Gemini — not because their content is the best, but because they branded themselves as 'the first agency to offer AEO services' back in 2023. They captured the mental category in AI training data before anyone else even tried. That window has partially closed now, but it is not closed for every vertical. The lesson: in AI search, naming the category early matters more than producing the best content.

Our team launched this challenge because we have grown skeptical of measurement tools whose dashboards do not match what the AI engines actually return. If your agency claims to do AEO, you should have a public, verifiable baseline. Otherwise, your AEO expertise rests on the same foundation as a third-party dashboard — and we wanted something more rigorous than that, the same standard we hold our own client work to.

We Tested It Twice — And It Still Said Zero

After completing the first audit pass, we expanded the methodology to dual-model checks: each AI provider was queried with both its latest available model AND a stable reference model. The point was to rule out model-specific noise — the possibility that a single model version simply happened to miss us while a newer one would have surfaced us.

Latest-tier models tested: OpenAI gpt-5.4 via the Responses API with the web_search tool, Anthropic claude-opus-4-6 with the web_search tool set to a maximum of five uses per query, and Gemini 2.5 Flash with Google Search grounding. Note: Gemini's latest Pro tier (gemini-3.1-pro-preview) was unavailable due to a project spending cap on our Google AI Studio account — we will address that and switch to Pro tier in Week 2.

The latest-tier models did far more work than the references. gpt-5.4 cited 14 sources across our three queries — nearly five times more than gpt-4o-search-preview's three. claude-opus-4-6 pulled 20 citations versus claude-sonnet-4-6's ten. The newer, more capable models searched harder, longer, and across more domains.

And they still found zero mentions of webappski.

That is the strongest possible baseline signal. The zero is not a function of which model we picked, or which API surface we used, or which provider we queried. It is structural invisibility — the brand is genuinely absent from the corpus that AI engines surface for these queries. There is no clever model selection that would have discovered us. We have to build the authority ourselves.

What Happens Next

Week 2 starts the work. Our team will publish the AEO content pipeline that has been waiting in drafts. We will begin outreach to the authors of the SaaS-AEO listicles where Webappski is currently absent — Discovered Labs, Team4 Agency, ABM Agency, Maximus Labs, and Omnius. The goal for Q3 (the SaaS vertical) is concrete: land in one of those five lists within four weeks. That single placement should produce our first measurable mention.

For Q1 (the general agency search) and Q2 (the how-to authority query), the path is longer. Q1 requires placement in the higher-authority listicles that Claude and Gemini already cite — Scrunch, Minuttia, ModernMarketingPartners. Q2 requires guest posts on the domains that AI engines treat as authoritative for AEO how-to: cranseo.com, trueffle.com, RankMath, and Prerender. Both efforts take months, not weeks.

Our realistic prediction for Week 8 is 15-20% visibility. Anything less means the problem is deeper than backlinks and content, and that conversation will be even more interesting to have publicly than this one.

Methodology

Every week, our team runs three unbranded queries against five AI engines. The three direct API tests use the latest available models with web search enabled: OpenAI gpt-4o-search-preview, Google Gemini 2.0 Flash with grounding, and Anthropic Claude Sonnet 4.6 with web search. The two manual platform checks use Perplexity and Microsoft Copilot in incognito mode, with no logged-in account, to remove personalization bias.

We record whether webappski is mentioned in the answer text, whether the brand domain appears in the cited sources, what position the mention takes if any, and the sentiment of the mention if any. We compute a visibility score as the percentage of fifteen total checks that returned a mention. Week 1 baseline is 0/15 = 0%. Every following week is compared against this exact baseline using the exact same queries and the exact same engines.

Follow Webappski's AEO Visibility Challenge weekly for the full series. We will publish every result — the wins, the failures, and especially the contradictions between what the trackers say and what the AI engines actually return.

← Back to all posts