MMatt Goren
← AI hub
GuideAI Search & AEOPrompting

How to Track Whether AI Is Citing You

A repeatable method to measure AI-search visibility: build a prompt set, query the engines, log citations, score share-of-voice, and turn the gaps into content.

By Matt Goren · Updated June 25, 2026 · 8 min read

Everyone wants to know if AI is recommending them, and almost nobody is measuring it. They check once — type their brand into ChatGPT, see a nice answer, feel good — and call it a strategy. That's not measurement, that's a vanity sample. Real tracking means asking the questions your customers ask, across the engines they use, on a regular cadence, and watching the trend move.

I'll be straight with you up front: this measurement is fuzzy. AI answers are non-deterministic, so you cannot get a clean, repeatable number the way you can with a keyword ranking. But fuzzy and useful beats precise and nonexistent. Here's a method that gives you a real, defensible read on whether you're gaining or losing ground.

1. Build a prompt set that reflects real demand

Your prompt set is the foundation, and it's where most people go wrong. They track "[brand name]" and feel great when the answer is glowing — of course it is, you asked the engine about you by name. That tells you almost nothing.

What you want are the unbranded, intent-driven questions a buyer asks before they know you exist. "Best tool for X." "How do I solve Y." "What's the difference between A and B." These are the moments where being the cited source actually wins you something.

Build 20 to 50 of them, grouped by intent:

  • Category questions — "best [your category] for [use case]"
  • Problem questions — "how do I [the problem you solve]"
  • Comparison questions — "[you] vs [alternative]" and "alternatives to [competitor]"
  • Branded questions — a handful of "what is [your brand]" to track how you're described, not just whether you appear

Write them the way a human types, not the way a marketer writes a headline. Then freeze the set. The whole method depends on asking the same questions every cycle so changes mean something. If you're not sure which questions matter, my guide on how to get cited by AI search walks through finding the queries worth owning.

2. Query the engines and capture the answers

Run every prompt through every engine you care about. The main ones: ChatGPT, Claude, Gemini, Perplexity, and Google's AI Overviews. They retrieve and cite differently, so coverage in one doesn't mean coverage in another — that variance is itself a finding.

Two settings discipline matters here. Use a clean session — logged out or in a fresh/incognito context — so personalization and memory don't feed you a flattering answer based on your own history. And capture the full response plus any citations, not just your impression of it. Screenshot or paste the whole thing. You need the raw material to score consistently and to go back and check.

A realistic pass of 30 prompts across 4 engines is 120 queries. That's an afternoon by hand the first time. Tedious, yes — but doing it manually once teaches you exactly how noisy and inconsistent the answers are, which is knowledge no dashboard hands you.

3. Log it the same way every time

Consistency in logging is what turns a pile of answers into a trend. For every prompt-and-engine pair, record the same fields. A spreadsheet is plenty:

prompt | engine | date | mentioned? | cited+linked? | your_url | competitors_named | notes

The distinction between mentioned and cited with a link is the one that matters most. A mention is the engine naming you in prose. A citation is the engine linking to your page as a source. Citations are stronger — they drive referral traffic and signal that your specific content was the basis for the answer. Track both, but weight citations higher.

In the competitors_named field, log who did show up when you didn't. That column is the most actionable thing in the whole sheet — it's your gap list, named for you by the engine itself.

4. Score share-of-voice

Now turn the log into numbers you can watch over time. A few simple metrics:

  • Mention rate — share of prompt-engine pairs where you appeared at all. mentions / total queries.
  • Citation rate — share where you were cited with a link. The stronger signal.
  • Share-of-voice — your mentions as a fraction of all brand mentions across the set. This is the competitive read: if four brands get named across your prompts and you're one of them, you hold roughly a quarter of the conversation.
  • Coverage by engine — the same rates broken out per engine, so you can see you're strong in Perplexity but invisible in Gemini.
Share-of-voice = your_mentions / (your_mentions + all_competitor_mentions)

None of these is precise to the decimal. Treat them as a directional index. The number that matters isn't "31% this month" in isolation — it's "22% last month, 31% this month, on the same 30 prompts." The trend is the signal; the absolute value is noise dressed up as precision.

5. Be honest about the fuzziness

I won't pretend this is clean. You should know exactly where the noise comes from, because knowing it keeps you from over-reacting:

  • Non-determinism. Run the same prompt twice and you can get different answers, different sources, different brands. This is the big one.
  • Personalization and memory. Account history and location shift results. Clean sessions reduce it, never fully kill it.
  • Model and index updates. A model update can move your visibility overnight through no action of your own.
  • Phrasing sensitivity. Reword a prompt slightly and the answer can change. That's why the frozen set matters.

The defenses are the method itself: a fixed prompt set, a regular cadence, and a focus on trend over snapshot. If you want to fight the non-determinism directly, run each prompt two or three times and record the majority outcome — more effort, more stable read. Don't ever report a single run as a hard fact. The AI search FAQ digs further into why these answers vary and what that means for expectations.

6. Turn gaps into content

Measurement is pointless if it doesn't change what you do next. The payoff of the whole exercise is the competitors_named column — every prompt where you're absent and someone else is cited is a brief writing itself.

Work the gaps in order:

  1. Find the prompts where you're invisible but competitors are cited. These are your highest-leverage targets — there's demonstrated demand and a citation slot you're losing.
  2. Read the cited pages. See what shape of content the engine is rewarding for that question — a direct answer, a comparison table, a data point. You're reverse-engineering what "good" looks like for that prompt.
  3. Build the better answer. Write the page that answers that exact question more directly, more completely, more credibly — structured to be lifted, never with invented facts.
  4. Re-measure next cycle. Did the new content move the citation rate on those specific prompts? That closes the loop and tells you whether the work paid off.

That loop — measure, find gaps, build, re-measure — is the entire game. The score isn't the point. The score tells you where to point the content engine, and the re-measure tells you if it worked.

One nuance worth holding onto: not every gap is worth closing, and not every win is yours to keep. Some prompts you'll never own because a wildly more authoritative source dominates them, and burning a month chasing that one question is a poor trade against three winnable ones sitting next to it in the sheet. Read the gap list with a sense of leverage — go where there's real demand and a realistic path to the citation slot. And when you do win one, expect it to wobble; non-determinism means a citation you earned this month may flicker out next month and back the month after. Don't panic-rewrite a page over a single bad run. Trust the trend across cycles, and only act when a drop holds for two reads in a row.

7. When to reach for a tool

You can run this whole method in a spreadsheet, and I'd argue you should start there. Doing it by hand teaches you what the numbers mean and how noisy they are — context a dashboard hides behind a clean chart. Once the manual pass becomes too slow to sustain at your prompt-set size, that's the signal to move to a tool, and you'll evaluate them well because you'll know exactly what they should be measuring and where they're papering over the fuzziness.

Start manual, get a baseline this month, run it again next month. Two data points on the same prompt set already tell you more than any one-off "let me just check ChatGPT" ever will.

FAQ

Can I really measure AI citations reliably? Reliably, no. Repeatably, yes. AI answers are non-deterministic, so the same prompt yields different responses on different days. You measure by sampling a fixed prompt set on a regular cadence and tracking the trend, not by chasing a single exact number.

How many prompts do I need to track? Start with 20 to 50 questions that map to how real buyers describe your category. Enough to cover your core topics and catch variation, few enough that you'll actually run them every cycle. Consistency of the set matters more than its size.

How often should I run the check? Monthly is the practical default for most. Run it more often around a big content push or a model update if you want to catch a shift, but a fixed monthly cadence on the same prompt set gives you a clean trend line without burning hours.

What exactly do I log for each answer? For each prompt and engine: were you mentioned, were you cited with a link, which URL, and which competitors showed up. That's enough to compute share-of-voice and spot the questions where you're invisible.

Should I use a paid tracking tool or do it manually? Start manual so you understand what you're measuring and why it's fuzzy. A spreadsheet and an afternoon gets you a real baseline. Move to a tool once the manual process is too slow to sustain, not before.

#measurement#ai-visibility
Want to apply this right now?

Use the free, no-API prompt generators to put it into practice.

Open Prompt Studio →
Keep reading