// Reference
AI Crawler Reference
The 14 crawlers that decide whether ChatGPT, Perplexity, Claude, Google AI Overviews, and Microsoft Copilot cite your site. User-agents, recommendations, and the one critical distinction most sites get wrong: training bots are not citation bots.
Most-misconfigured bot: GPTBot. Blocking it does not remove you from ChatGPT citations — those come from OAI-SearchBot, a separate user-agent.
// Search / Citation Bots
Allow these. They decide whether your site can appear in AI answer citations.
- OAI-SearchBotOpenAIOpenAI's crawler for ChatGPT search citations. Distinct from GPTBot (training). Allow it to appear in ChatGPT answers. Allow
- PerplexityBotPerplexityPerplexity's primary search crawler. Bursty traffic, listicle-favoring, requires edge caching to survive viral spikes. Allow
- Claude-SearchBotAnthropicAnthropic's live-search crawler. Powers Claude's web search citations specifically. Allow
- GooglebotGoogleGoogle's primary crawler. Powers both classical search and AI Overviews — the largest single source of AI citations. Allow
- bingbotMicrosoftMicrosoft's crawler. Powers Bing Search and feeds Copilot citations across Windows, Edge, and Microsoft 365. Allow
- FacebookBotMetaMeta's citation crawler. Powers Meta AI citations across WhatsApp, Instagram, and Facebook chat. Allow
// User-Initiated Fetch Bots
Allow these. They fire when a user pastes your URL into an AI chat — blocking creates visible UX failures.
// Hybrid (Search + General Fetch)
Allow. These serve both live citations and tool-use fetches.
// Training-Only Bots
Block these only if you opt out of model training. Blocking does NOT affect citations from the same vendor.
- GPTBotOpenAIOpenAI's training-data crawler. Blocking it opts out of GPT model training but does NOT block ChatGPT citations. Opt-out only
- anthropic-aiAnthropicAnthropic's training opt-out user-agent. Like Google-Extended — blocking it opts out of training without affecting citations. Opt-out only
- Google-ExtendedGoogleGoogle's Gemini training opt-out. Does NOT affect Googlebot or AI Overviews — only opts out of training. Opt-out only
- Meta-ExternalAgentMetaMeta's Llama training crawler. Aggressive and historically less robots.txt-compliant than other major vendors. Opt-out only
- CCBotCommon CrawlCommon Crawl's open-dataset crawler. Indirectly feeds training for nearly every major LLM. Optional block
Check your robots.txt
Run /check on any URL and we'll flag bots you're accidentally blocking — the #1 reason sites are silently missing from AI citations.
Run a free scan