AI Robots.txt Guide: Managing All AI Crawlers

Learn about the different AI crawlers, their user agents, and how to configure your robots.txt file to control access for each company.

What is robots.txt?

A robots.txt file is a standard used by websites to communicate with web crawlers and other automated clients. It specifies which parts of the site should not be processed or scanned.

With the rise of AI systems and large language models (LLMs), the robots.txt file has gained new importance. It's now a key mechanism for controlling how AI systems interact with your website content.

Important: Our tool always shows AI chatbots as allowed regardless of robots.txt directives to help you understand how AI systems might interact with your content.

AI Crawlers by Company

OpenAI

OpenAI Logo

OpenAI Crawlers

OpenAI uses multiple crawlers for different purposes, from training their AI models to providing search functionality within ChatGPT.

GPTBot

Used for training GPT models on web content.

User-agent: Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko); compatible; GPTBot/1.1; +https://openai.com/gptbot)
ChatGPT-User

Used when ChatGPT users browse the web during conversations.

User-agent: Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko); compatible; ChatGPT-User/1.0; +https://openai.com/bot)
OAI-SearchBot

Used for obtaining search results and browsing capabilities in ChatGPT.

User-agent: Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko); compatible; OAI-SearchBot/1.0; +https://openai.com/searchbot)

Anthropic

Anthropic Logo

Anthropic Crawlers

Anthropic uses crawlers to support Claude AI, its conversational AI assistant.

Anthropic AI

Used for training and improving Claude AI models.

User-agent: Mozilla/5.0 (compatible; anthropic-ai/1.0; +http://www.anthropic.com/bot.html)
ClaudeBot

Used for Claude's web browsing capabilities.

User-agent: Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko); compatible; ClaudeBot/1.0; +claudebot@anthropic.com)
Claude Web

Used for Claude's web interface interactions.

User-agent: Mozilla/5.0 (compatible; claude-web/1.0; +http://www.anthropic.com/bot.html)

Google

Google Logo

Google AI Crawlers

Google uses specialized crawlers for its AI models beyond its traditional search engine crawlers.

Google Extended

Used for training Google's Gemini AI models.

User-agent: Mozilla/5.0 (compatible; Google-Extended/1.0; +http://www.google.com/bot.html)

Perplexity

Perplexity Logo

Perplexity Crawlers

Perplexity AI uses crawlers to power its AI search capabilities.

PerplexityBot

Used for Perplexity's AI search capabilities.

User-agent: Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko); compatible; PerplexityBot/1.0; +https://perplexity.ai/perplexitybot)

xAI (Grok)

xAI Logo

xAI Crawlers

xAI, Elon Musk's AI company, uses crawlers to support its Grok AI assistant.

GrokBot

Used for training Grok AI.

User-agent: GrokBot/1.0 (+https://x.ai)
Grok Search

Used for Grok's search capabilities.

User-agent: xAI-Grok/1.0 (+https://grok.com)
Grok DeepSearch

Used for Grok's advanced search capabilities.

User-agent: Grok-DeepSearch/1.0 (+https://x.ai)

Other Major AI Crawlers

Many other AI companies and search engines have their own crawlers for AI-related functionality.

Apple (Siri & Apple Intelligence)
User-agent: Mozilla/5.0 (compatible; Applebot/1.0; +http://www.apple.com/bot.html)
User-agent: Mozilla/5.0 (compatible; Applebot-Extended/1.0; +http://www.apple.com/bot.html)
Meta (Facebook)
User-agent: Mozilla/5.0 (compatible; meta-externalagent/1.1 (+https://developers.facebook.com/docs/sharing/webmasters/crawler))
Cohere
User-agent: Mozilla/5.0 (compatible; cohere-ai/1.0; +http://www.cohere.ai/bot.html)
You.com
User-agent: Mozilla/5.0 (compatible; YouBot (+http://www.you.com))
DuckDuckGo
User-agent: Mozilla/5.0 (compatible; DuckAssistBot/1.0; +http://www.duckduckgo.com/bot.html)

How to Configure Your robots.txt File

Your robots.txt file should be placed at the root of your website (e.g., https://example.com/robots.txt). Here are some examples of how to configure your robots.txt file for AI crawlers:

Example 1: Allow all AI crawlers

User-agent: *
Allow: /

# This explicitly allows all AI crawlers, though it's redundant with the wildcard above
User-agent: GPTBot
Allow: /

User-agent: ClaudeBot
Allow: /

User-agent: Google-Extended
Allow: /

# ... and so on for other crawlers

Example 2: Block all AI crawlers

# Allow regular crawlers
User-agent: *
Allow: /

# Block specific AI crawlers
User-agent: GPTBot
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: Google-Extended
Disallow: /

User-agent: PerplexityBot
Disallow: /

User-agent: GrokBot
Disallow: /

User-agent: Applebot-Extended
Disallow: /

# ... and so on for other AI crawlers

Example 3: Selective access for different AI companies

# Allow search bots but block training bots
User-agent: OAI-SearchBot
Allow: /

User-agent: GPTBot
Disallow: /

User-agent: ClaudeBot
Allow: /

User-agent: anthropic-ai
Disallow: /

User-agent: Google-Extended
Disallow: /

Example 4: Allow access to only specific directories

# Only allow AI crawlers to access the public and blog sections
User-agent: GPTBot
Allow: /public/
Allow: /blog/
Disallow: /

User-agent: ClaudeBot
Allow: /public/
Allow: /blog/
Disallow: /

Impact of Allowing or Blocking AI Crawlers

Benefits of Allowing

  • Improved visibility in AI-powered search results
  • Potential for your content to be used as a source in AI responses
  • Contribution to AI training may lead to better models
  • Stay relevant in an increasingly AI-driven web landscape

Reasons to Block

  • Protect proprietary or sensitive information
  • Prevent AI systems from learning from or reproducing your unique content
  • Reduce server load from crawler traffic
  • Exercise control over how your content is used in AI systems

Best Practices for Managing AI Crawlers

  1. Regularly review and update your robots.txt file as new AI crawlers emerge
  2. Be specific about which parts of your site should be accessible to which crawlers
  3. Consider having different policies for AI training bots versus search bots
  4. Monitor your server logs to see which AI crawlers are visiting your site
  5. Use our robots checker tool to verify your robots.txt configuration
  6. Stay informed about changes to AI crawler policies

Check Your robots.txt Configuration

Want to see how your website's robots.txt file interacts with AI crawlers? Use our Robots Checker tool to:

  • Analyze your robots.txt file for AI crawler configurations
  • See which AI crawlers are allowed or blocked on your site
  • Get recommendations for improving your robots.txt configuration
  • Monitor changes in AI crawler policies and their impact on your site