AI Robots.txt Guide: Managing All AI Crawlers
Learn about the different AI crawlers, their user agents, and how to configure your robots.txt file to control access for each company.
What is robots.txt?
A robots.txt file is a standard used by websites to communicate with web crawlers and other automated clients. It specifies which parts of the site should not be processed or scanned.
With the rise of AI systems and large language models (LLMs), the robots.txt file has gained new importance. It's now a key mechanism for controlling how AI systems interact with your website content.
Important: Our tool always shows AI chatbots as allowed regardless of robots.txt directives to help you understand how AI systems might interact with your content.
AI Crawlers by Company
OpenAI
OpenAI Crawlers
OpenAI uses multiple crawlers for different purposes, from training their AI models to providing search functionality within ChatGPT.
GPTBot
Used for training GPT models on web content.
User-agent: Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko); compatible; GPTBot/1.1; +https://openai.com/gptbot)
ChatGPT-User
Used when ChatGPT users browse the web during conversations.
User-agent: Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko); compatible; ChatGPT-User/1.0; +https://openai.com/bot)
OAI-SearchBot
Used for obtaining search results and browsing capabilities in ChatGPT.
User-agent: Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko); compatible; OAI-SearchBot/1.0; +https://openai.com/searchbot)
Anthropic
Anthropic Crawlers
Anthropic uses crawlers to support Claude AI, its conversational AI assistant.
Anthropic AI
Used for training and improving Claude AI models.
User-agent: Mozilla/5.0 (compatible; anthropic-ai/1.0; +http://www.anthropic.com/bot.html)
ClaudeBot
Used for Claude's web browsing capabilities.
User-agent: Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko); compatible; ClaudeBot/1.0; +claudebot@anthropic.com)
Claude Web
Used for Claude's web interface interactions.
User-agent: Mozilla/5.0 (compatible; claude-web/1.0; +http://www.anthropic.com/bot.html)
Google AI Crawlers
Google uses specialized crawlers for its AI models beyond its traditional search engine crawlers.
Google Extended
Used for training Google's Gemini AI models.
User-agent: Mozilla/5.0 (compatible; Google-Extended/1.0; +http://www.google.com/bot.html)
Perplexity
Perplexity Crawlers
Perplexity AI uses crawlers to power its AI search capabilities.
PerplexityBot
Used for Perplexity's AI search capabilities.
User-agent: Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko); compatible; PerplexityBot/1.0; +https://perplexity.ai/perplexitybot)
xAI (Grok)
xAI Crawlers
xAI, Elon Musk's AI company, uses crawlers to support its Grok AI assistant.
GrokBot
Used for training Grok AI.
User-agent: GrokBot/1.0 (+https://x.ai)
Grok Search
Used for Grok's search capabilities.
User-agent: xAI-Grok/1.0 (+https://grok.com)
Grok DeepSearch
Used for Grok's advanced search capabilities.
User-agent: Grok-DeepSearch/1.0 (+https://x.ai)
Other Major AI Crawlers
Many other AI companies and search engines have their own crawlers for AI-related functionality.
Apple (Siri & Apple Intelligence)
User-agent: Mozilla/5.0 (compatible; Applebot/1.0; +http://www.apple.com/bot.html)
User-agent: Mozilla/5.0 (compatible; Applebot-Extended/1.0; +http://www.apple.com/bot.html)
Meta (Facebook)
User-agent: Mozilla/5.0 (compatible; meta-externalagent/1.1 (+https://developers.facebook.com/docs/sharing/webmasters/crawler))
Cohere
User-agent: Mozilla/5.0 (compatible; cohere-ai/1.0; +http://www.cohere.ai/bot.html)
You.com
User-agent: Mozilla/5.0 (compatible; YouBot (+http://www.you.com))
DuckDuckGo
User-agent: Mozilla/5.0 (compatible; DuckAssistBot/1.0; +http://www.duckduckgo.com/bot.html)
How to Configure Your robots.txt File
Your robots.txt file should be placed at the root of your website (e.g., https://example.com/robots.txt). Here are some examples of how to configure your robots.txt file for AI crawlers:
Example 1: Allow all AI crawlers
User-agent: * Allow: / # This explicitly allows all AI crawlers, though it's redundant with the wildcard above User-agent: GPTBot Allow: / User-agent: ClaudeBot Allow: / User-agent: Google-Extended Allow: / # ... and so on for other crawlers
Example 2: Block all AI crawlers
# Allow regular crawlers User-agent: * Allow: / # Block specific AI crawlers User-agent: GPTBot Disallow: / User-agent: ClaudeBot Disallow: / User-agent: Google-Extended Disallow: / User-agent: PerplexityBot Disallow: / User-agent: GrokBot Disallow: / User-agent: Applebot-Extended Disallow: / # ... and so on for other AI crawlers
Example 3: Selective access for different AI companies
# Allow search bots but block training bots User-agent: OAI-SearchBot Allow: / User-agent: GPTBot Disallow: / User-agent: ClaudeBot Allow: / User-agent: anthropic-ai Disallow: / User-agent: Google-Extended Disallow: /
Example 4: Allow access to only specific directories
# Only allow AI crawlers to access the public and blog sections User-agent: GPTBot Allow: /public/ Allow: /blog/ Disallow: / User-agent: ClaudeBot Allow: /public/ Allow: /blog/ Disallow: /
Impact of Allowing or Blocking AI Crawlers
Benefits of Allowing
- Improved visibility in AI-powered search results
- Potential for your content to be used as a source in AI responses
- Contribution to AI training may lead to better models
- Stay relevant in an increasingly AI-driven web landscape
Reasons to Block
- Protect proprietary or sensitive information
- Prevent AI systems from learning from or reproducing your unique content
- Reduce server load from crawler traffic
- Exercise control over how your content is used in AI systems
Best Practices for Managing AI Crawlers
- Regularly review and update your robots.txt file as new AI crawlers emerge
- Be specific about which parts of your site should be accessible to which crawlers
- Consider having different policies for AI training bots versus search bots
- Monitor your server logs to see which AI crawlers are visiting your site
- Use our robots checker tool to verify your robots.txt configuration
- Stay informed about changes to AI crawler policies
Check Your robots.txt Configuration
Want to see how your website's robots.txt file interacts with AI crawlers? Use our Robots Checker tool to:
- Analyze your robots.txt file for AI crawler configurations
- See which AI crawlers are allowed or blocked on your site
- Get recommendations for improving your robots.txt configuration
- Monitor changes in AI crawler policies and their impact on your site