We crawl and archive websites for our clients on daily basis. Since last Wednesday (19th July) we have seen a large increase http 429 errors when crawling some client sites and have been unable to archive their sites. All the sites that have this error are using Cloudflare. Are there any global settings in Cloudflare and or any other changes from last week that could have caused this issue?
A 429 response means that the request has been rate limited. You are going to need to reach out to your customers and work with them to make sure that your service is not blocked.
These blocks are normally set at the domain level, and so they should be able to allow your traffic,
Thanks for your reply. We are now seeing 403 errors rather than 429, which presumably means we are being identified as a bot. This issue is affecting about 200 websites across about 60 different clients and the problem for these all started at the same time last week. We have previously asked all our clients to allowlist our IP addresses and these haven’t changed.
We have started the process of reaching out clients to ask them to allow our traffic, but given the number of sites this could take weeks to resolve. It seemed strange that we saw this issue across so many websites that all use Cloudflare at the same time, hence the question if there was any global Cloudflare configuration that might have caused this and if there might be any quicker way to resolve this than contacting clients individually.
hence the question if there was any global Cloudflare configuration that might have caused this and if there might be any quicker way to resolve this than contacting clients individually.
I don’t think there is such a configuration.
Could you please confirm if the 403s that you are seeing are Cloudflare branded?
If so, could you please provide us an example of a Ray ID so that we may further investigate this issue?
We are now seeing 403 errors rather than 429, which presumably means we are being identified as a bot.
Not necessarily, Cloudflare will serve 403 responses if the request violated either a default WAF managed rule enabled for all orange-clouded Cloudflare domains or a WAF managed rule enabled for that particular zone.
The issue is still ongoing, we have asked our clients to reach out to you directly to resolve the issue. In the meantime here are some more Ray Ids from yesterday:
Things are looking much better today and sites that had started rejecting our requests last week are all working again today. It seems unlikely that all our clients updated their configuration to allow list our IPs at the same time (and they should have already been allow-listed anyway), so I’m still inclined to believe there was some sort of global issue that was causing our requests to be rejected