I’m fairly new to analyzing CF’s traffic to confirm good versus bad traffic, but have been doing a good deal of reading on spoofing Google’s Bots to make it hard to filter them.
Yesterday (into today) we’ve been seeing a ton of traffic to our site with the User Agent “AdsBot-Google (+http://www.google.com/adsbot.html)”, ASN of “15169”, which from my research are both associated with Google so I hesitate to block it. Thing is, according to CF some of the individual files (e.g. css, js, etc.) are being hit several thousand times in a day (see screenshot, which shows the “crawler” activity on the noted files just within the last 6 hours), which makes me believe this is NOT Google.
The crawler is also showing as coming from France (a known Google Bot origin) and using IP addresses in the 66.249.92.XX range. I did a reverse DNS lookup on a few and they all come back as being associated with Google. But I know these can be spoofed, right?
My questions are:
- Are there sure fire ways to tell if a visitor is a legit Google Bot using data in CF, and if so what do I look for outside of what I already have?
- Is it normal for a single file to be hit thousands of times a day as shown in the screenshot?
- If this is indeed NOT looking to be Google, what can I do to block this traffic without actually blocking legit GoogleBot?