One of the sites we look after is getting traffic spikes of 20K+ requests over the period of an hour at 3 times per day, everyday.
The Source ASN is “396982 - GOOGLE-CLOUD-PLATFORM”.
Any ideas what that could be from and is it safe to block that traffic? Obviously don’t want to prevent any legitimate bot traffic. We have had issues in the past with competitors price scraping the website, which is why we setup on Cloudflare to help prevent.
May I ask if you might be using Cloudflare Workers for your zone or some other in your Cloudflare account?
Otherwise, if you aren’t using GCP for hosting or any other service, I’d say it’s safe to block the whole AS number via IP Access Rules.
It might be Google PageSpeed won’t work, but at least you’d get rid of unwanted & bad traffic from bots, etc. It won’t block Google Search indexing since Google crawlers come from different AS number of the Google.
Scrapers aren’t nice at all and I am afraid there is no such “single-click” silverbullet solution to them, yet
No we’re not using Workers or anything. We really only have firewall rules setup to block the previous scraping traffic. Thinking maybe they switched to a different service or platform to do it.
I’ll try blocking it and see how it goes. Indexing was just the main concern.
Google Merchant Center does access a product feed on there so we’ll have to monitor that too in case it stops working.
Try catching requests and consider looking for user-agents, which thenfrom you could block by the part of the user-agent string, if that might help a bit, at least … if you aren’t doing this already.
If you use ads, careful.
Ensure you allow ads.txt and block all other requests when blocking entire asns that are used for hosting, especially Amazon, Google etc.
If no ads then do not worry.
No it’s not safe to block Google or Microsoft ASN’s - Both GCP and Azure have many SASE SD-WAN and VPN/Firewall companies - some of the largest. If you block all the Google and Microsoft ASN - just because you are not using their cloud services - you are also going to block a very large amount of legit user traffic from hitting your site.