OpenAI Launches GPTBot Web Crawler to access Internet

Learn How to allow or disallow GPTBot to access your website

Aug 09, 2023

OpenAI Launches GPTBot Web Crawler to access Internet

OpenAI has launched a new web crawler tool dubbed "GPTBot" that it says could potentially improve upcoming ChatGPT AI models. In a web post, OpenAI explained that net pages crawled with GPTBot "could potentially be utilized to improve future models" by making them extra correct and expand capabilities.

A webcrawler indexes websites across the internet, much like how Google's search engine works. OpenAI says GPTBot will filter out paywalled content material, personal information, and coverage-violating text when crawling the net. Web site homeowners can even block GPTBot the usage of same old server settings.

GPTBot can be identified by the following user agent and string.

User agent token: GPTBot
Full user-agent string: Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; GPTBot/1.0; +https://openai.com/gptbot)

How to disallow GPTBot to access your website/webpages ?

To disallow GPTBot to access your site you can add the GPTBot to your site’s robots.txt:

User-agent: GPTBot
Disallow: /

How to control or customize GPTbot access ?

To allow GPTBot to access only parts of your site you can add the GPTBot token to your site’s robots.txt like this:

User-agent: GPTBot
Allow: /directory-1/
Disallow: /directory-2/

The release comes after OpenAI filed a trademark for "GPT-5", suggesting an eventual successor to GPT-4. However, OpenAI CEO Sam Altman stated the agency is "nowhere close" to beginning GPT-5 training yet.

Many users have raised concerns on Hacker News regarding GPTBot – OpenAI’s Web Crawler. Japan's privateness watchdog had already warned OpenAI over collecting touchy facts without consent. Italy briefly banned ChatGPT over alleged privateness law violations. A elegance motion lawsuit accuses OpenAI of accessing private ChatGPT consumer records, which could violate the Computer Fraud and Abuse Act if confirmed.

GPTBot will undoubtedly aid the company in collecting a greater volume of data from various online sources to enhance the training of this model. Conversely, the company has chosen to discontinue its AI Classifier for identifying text generated by GPT.

Discussion about this post

Ready for more?