My website is constantly getting hit by web scrapers. It would be one thing if they included a proper user-agent header to indicate they are a bot, but they are trying to hide it. The thing is that the requests are all sourced from Cloudflare IP addresses. I have already blocked dozes of /24s but they keep changing. i do not know how I can set up my domain on cloudflare for protection if the source of the scraping is cloudflare itself. I assume this is part of the hosting service (workers?). Do you require bots or AI content accumulators to identify themselves as bots? If so, they are violating it. If not, you Cloudflare is going to start being added to blocklists if it has not already.
What steps have you taken to resolve the issue?
What steps have you taken to resolve the issue?
blocking /24s at layer 4 and layer 7
I can’t seem to edit my original posts, that first IP is not cloudflare (copy/paste fail) . Also, my site is obviously not example.com, but I do not want to post my site publicly here.
Are you using Cloudflare with the proxy already and are you seeing those IPs in your origin logs? If so, make sure you are restoring original visitor IPs so you see the actual client IP addresses…
You can then use the WAF, Bot Fight Mode/Super Bot Fight Mode and the Block AI Bots options to prevent them.
No, I have not added this domain to cloudflare yet. I just find it disconcerting that I received close to 13,000 requests last week from cloudflare IPs. They were not users. They were bots. I imagine cloudflare whitelists their own IP space for their bot protection, but I could be wrong. It is just bad form to allow that sort of traffic. I am just one person; when many start blocking cloudflare and it gets added to public blocklists, it may be bad for business.