AI Crawler Access Checker

Crawler / token	Operator · purpose	Status	Why

Reviewable robots.txt policy suggestion

Generated from the intent you selected. Review and merge it with existing groups only after confirming licensing, privacy, training, and retrieval policy.

This checker evaluates how your website's robots.txt treats documented crawler and product tokens. The list includes retrieval crawlers, training crawlers, user-triggered fetchers, and control tokens. They do not all perform the same job: AI operators distinguish training access (reading content to train models) and retrieval access (reading content to answer live queries). A blocked training crawler may still allow retrieval — or vice versa. Most robots.txt configurations were written before these tokens existed and may apply an unintended policy.

This tool fetches your live robots.txt, parses it against 14 tracked tokens, and returns a color-coded policy matrix: green (allowed by the file), red (disallowed by the file), yellow (no explicit named rule). A robots.txt rule states publisher policy; this check does not prove network-layer enforcement. The tool generates no Allow rules by default. If you explicitly request a suggestion, retrieval/user-request tokens remain separate from training and product-control tokens so a deliberate data-use policy is not silently reversed. Google-Extended and Applebot-Extended are non-requesting control tokens, not live crawlers.

No signup required. Paste your URL. Results in under 10 seconds.

How to use

Paste your site's root URL (e.g., https://yourdomain.com) into the field above.
Click "Check AI Crawler Access."
Review the policy matrix — a red row means the named crawler or token is disallowed by the published file, not that enforcement or every route to visibility was proven.
If you selected an output intent, review the generated suggestion against your licensing, privacy, training, and retrieval policy before changing anything.
Optionally: run a full CiteFuel audit to check llms.txt, schema, and passage citability alongside crawler access.

Frequently asked questions

What is GPTBot?

GPTBot is OpenAI's crawler for potential model-training use. OAI-SearchBot is the relevant crawler for discoverability, summaries, and links in ChatGPT Search. Blocking GPTBot does not by itself block OAI-SearchBot or prove that ChatGPT cannot cite a page.

Should I block all AI crawlers?

Crawler policy is a publisher choice based on retrieval, licensing, privacy, and training preferences. Blocking a documented retrieval crawler prevents that crawler from fetching the affected URLs directly, but it does not prove the brand can never appear through another index or source.

Does CiteFuel store my robots.txt?

We fetch it transiently to run the check. We do not store it or associate it with your domain unless you create an account.

What if I have a wildcard Disallow rule?

A wildcard User-agent: * with Disallow: / applies to crawlers without a more specific matching group. CiteFuel reports its effect across 14 documented crawler and product tokens. It generates a reviewable policy suggestion only after you explicitly choose which class you intend to allow.

Measure your site’s AI-readiness gaps, then review the evidence.

Free 26-check audit. No card. No login. Just a URL — results in ~90 seconds.

Audit my site free →