I am asking ChatGPT to read one of my RSS feeds. It’s getting a CAPTCHA. It appears Cloudflare is viewing it as a bot. Is there a way to allowlist ChatGPT to bypass the CAPTCHA?
When asking ChatGPT for its IP addresses, I get this in return:" I don’t have a specific IP address because I operate in a cloud-based environment, utilizing a range of IP addresses across multiple servers and locations. This setup enhances the reliability and scalability of the service. If you’re encountering an IP-related issue, such as a CAPTCHA or an IP block when trying to access a website, it’s due to the shared nature of the cloud infrastructure. If you have specific technical or access issues, it might be best to contact OpenAI support or explore the documentation for more detailed information."
I am reaching out to OpenAI as well, but has anybody else had this issue and what was your resolution?
The JSON that OpenAI website points to, seems to list several IP address ranges, at the time of writing this, all on the Microsoft Azure cloud platform.
Connecting the dots together here, you would have multiple options, such as for example:
Allow known bots.
Allow the IP addresses, which can cause recurring maintenance from time to time.
Allow the User-Agent (or parts thereof) mentioned on the GPTBot documentation.
Allow Microsoft’s AS8075, which may cause a lot of “junk” traffic to be allowed as well.
I wouldn’t ever do #3 alone, as anyone can set whatever User-Agent they like, however, combining for example #1 and #3 together could maybe be the solution that may have the least amount of (recurring) maintenance.
Such rules would be very similar to e.g. the one I elaborated on about PayPal over here:
Note: The User-Agent being “PayPal” here, and the AS numbers in that example, would obviously need to be adjusted to your specific use-case.
Eventually more check marks under “WAF components to skip” might be necessary as well.