OpenAI User Agents

Understanding how OpenAI accesses and indexes web content

Back to Documentation

Introduction to OpenAI User Agents

OpenAI uses several different user agents and web crawlers to interact with web content for various purposes, from training AI models to providing search results in ChatGPT. Understanding these user agents is crucial for website owners who want to optimize their content for OpenAI's systems or control how their content is accessed.

OpenAI User Agent Identification

OpenAI identifies itself with specific user agents when accessing web content. Here are the known OpenAI user agents:

GPTBot

Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko); compatible; GPTBot/1.1; +https://openai.com/gptbot

This user agent is used for crawling content that may be used in training OpenAI's generative AI foundation models.

Published IP addresses: https://openai.com/gptbot.json

OAI-SearchBot

Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko); compatible; OAI-SearchBot/1.0; +https://openai.com/searchbot

This user agent is used for search functionality in ChatGPT's search features. It is not used to crawl content for training AI models.

Published IP addresses: https://openai.com/searchbot.json

ChatGPT-User

Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko); compatible; ChatGPT-User/1.0; +https://openai.com/bot

This user agent is used when users ask ChatGPT or a Custom GPT to visit a web page. It's not used for automatic crawling or AI training.

Published IP addresses: https://openai.com/chatgpt-user.json

How OpenAI Accesses Web Content

OpenAI accesses web content in several ways:

  • AI model training - GPTBot crawls content that may be used to train generative AI models
  • Search functionality - OAI-SearchBot indexes content to provide search results in ChatGPT
  • Direct user requests - ChatGPT-User accesses specific URLs when requested by users

Note: For search results, it can take approximately 24 hours from a site's robots.txt update for OpenAI's systems to adjust.

Controlling OpenAI's Access to Your Content

Website owners can control how OpenAI accesses their content through:

Robots.txt Configuration

You can use the following directives in your robots.txt file:

# Allow search but prevent training
User-agent: GPTBot
Disallow: /

User-agent: OAI-SearchBot
Allow: /

User-agent: ChatGPT-User
Allow: /

# Block all OpenAI access
User-agent: GPTBot
Disallow: /

User-agent: OAI-SearchBot
Disallow: /

User-agent: ChatGPT-User
Disallow: /

Optimizing Content for OpenAI

To ensure your content performs well when accessed by OpenAI systems:

  • Use clear, well-structured HTML with proper semantic markup
  • Ensure content is accessible and doesn't rely solely on JavaScript for rendering
  • Provide comprehensive information with factual accuracy
  • Include relevant metadata and schema markup
  • Consider which parts of your site should be available for AI training versus search only

Tracking OpenAI Visits

With xseek, you can track when and how OpenAI accesses your content:

  • Monitor OpenAI user agent visits in your analytics dashboard
  • Track which content is being accessed by different OpenAI crawlers
  • Analyze how your content appears in ChatGPT responses
  • Receive notifications about changes in OpenAI crawling patterns

Related User Agents

Learn about other AI user agents to better manage your website's interaction with AI systems:

Source: Information in this guide is sourced from OpenAI's official documentation.

Track AI Bots on Your Website

Start monitoring AI user agents and optimize your content for better AI visibility with xseek.