Does Your Site Appear in ChatGPT? Here's How to Check
To check whether ChatGPT can cite your site, you test three independent layers: does the model know your brand, can its crawlers access your content, and does live search retrieval actually surface you for the questions you should win. Most teams only test the first layer, get a flattering or confusing answer, and stop. Here's the full protocol — fifteen minutes, no tools required (and the automated version at the end).
Layer 1 — Does the model know you exist?
Ask each engine, in a fresh session with web search off where possible:
What is {yourbrand}.com?
What does {YourBrand} do?
Who are the main providers of {your category}? Score each answer against four outcomes:
- Known + accurate — the model describes you correctly. Your entity layer is healthy.
- Known + wrong — it confuses you with another company or describes a years-old version of you. Entity ambiguity; see the fix below.
- Unknown — "I don't have information about…". Common and fixable for sites younger than the model's training data; live retrieval (layer 3) matters more for you.
- Hallucinated — confident nonsense. Treat as "known + wrong," with urgency.
Layer 2 — Can the crawlers physically reach you?
Model knowledge and live retrieval both depend on crawler access, and this is where silent failure
lives. Check your robots.txt for the tokens that matter — GPTBot and
OAI-SearchBot for ChatGPT (training and live retrieval are governed
separately), ClaudeBot/Claude-SearchBot for Claude,
PerplexityBot, and Google-Extended for Gemini surfaces. Three configurations
block silently:
- A wildcard
User-agent: * / Disallow: /written years before AI crawlers existed — it blocks all of them by fallback. - An allowlist robots.txt that names Googlebot and disallows everyone else.
- A WAF or bot-manager (Cloudflare Bot Fight Mode is the classic) challenging AI user-agents before robots.txt is even read — your robots.txt can say "allowed" while the firewall says 403.
The free crawler checker parses your live robots.txt against 14 AI agents in seconds; the full audit additionally probes your server with real AI user-agent strings to catch the WAF case.
Layer 3 — Does live retrieval surface you?
Now turn web search on (ChatGPT with search, Perplexity, Google AI Overviews) and ask the questions your customers ask — not your brand name:
best {category} for {use case}
how do I {problem your product solves}
{competitor} alternatives Record two things per engine: are you cited as a source (linked), and are you mentioned in the answer text? Citation without mention means you're a supporting source — fine. Mention without citation means the engine knows you but trusts other pages to describe you — an extractability gap on your own pages. Absent from both while competitors appear: that's your GEO to-do list, and it almost always traces back to layers 1-2 plus passage structure.
What each failure pattern means
| Pattern | Root cause | Fix |
|---|---|---|
| Unknown to models, absent from retrieval | Crawler blocks (P0) | robots.txt AI rules + WAF allowlist — a four-line fix |
| Known but described incorrectly | Entity ambiguity | Align homepage first-passage, Organization schema, and llms.txt description; build structured directory references |
| Cited for brand queries, never for category queries | Passage extractability | Definition-first rewrites of the pages answering category questions; FAQPage schema |
| Competitors cited from your content topics | Authority + structure | Standalone citable passages + internal links concentrating topic authority |
Automate the protocol
Manual testing is fine quarterly; it doesn't scale weekly and it misses the configuration layer entirely. The free CiteFuel audit runs all three layers in one pass — robots + WAF probes with real AI user-agents, llms.txt and schema validation, LLM-scored passage extractability, and live sampling of AI answer surfaces — and returns a 0-100 score with the severity-ranked gap list. The methodology, including every check and weight, is public at /methodology.
Whatever tool you use: test all three layers, write down the date and the answers, and re-test after each fix. AI visibility is an engineering loop, not a vibe.