This was not a fake Googlebot. Take a look.!
It was blocked by default, while it shouldn’t.
I doubt it
host 34.76.251.191
191.251.76.34.in-addr.arpa domain name pointer 191.251.76.34.bc.googleusercontent.com.
I am an SEO and from practice I know Googlebot have different alterations. I’ve been clearly seeing *.bc.googleusercontent.com heavy load after submit to reindex requests.
How can I put whitelist *.bc.googleusercontent.com DNS resolve above every rule out there?
Someone should ask Google to update their support pages then. Something like: “we don’t use PTR anymore.” Point is that they use PTR so that users are able to verify if it’s legit or not. UAs can be spoofed.
Anyway. This is a WAF event and you can’t exclude hosts or user agents, you need to disable it for the URL path using page rules.
The base domain name of
googleusercontent.com clearly is what it says it is, “Google User Content” which is known to be connected to the [Google App Engine “Platform as a Service” product](https://cloud.google.com/appengine/docs). And that allows any user to create and deploy code in Python, Java, PHP & Go applications to their service.
I know that anyone could do register that, my own hosting uses Google Cloud.
Is there a way to take the risk and exclude *.bc.googleusercontent.com from an rule out there? To whitelist it?
Not sure . I think the OP has a point, 34.64.0.0/10 appears to be a Google network.
It might be either translation or cache related → https://www.quora.com/What-does-it-mean-when-you-get-a-referral-from-googleusercontent-com
All Google Cloud Platform customers have Google IPs.
For now it doesn’t look legit.
Unfortunately not
Official Support answer. Just in case…
.
**D..... P.....** (Cloudflare)
Feb 25, 6:38 AM PST
Hi,
The Firewall Rules are independent of the Access Rules and the WAF, so it would not allow
you to bypass the Access Rules/WAF with a Firewall Rule - We are not able to to whitelist
based on User-Agent/Path Unfortuantely.
You can use our page rules to turn off the WAF for specific URI with page rules --
[Is there a tutorial for Page Rules](https://support.cloudflare.com/hc/en-us/articles/200168306-Is-there-a-tutorial-for-Page-Rules-)
True, googleusercontent might be the default PTR in such cases.
So most of these IPs were coming from France, after researching a bit, I decided to leave them blocked and I also reported them to Google Cloud as I believe that may not be even legal to be presented that way. At the end of the day Google is a trademark and someone else shouldn’t be presenting himself as Google’s bot, even if his service is crawling for legit purposes and just wanted to stay a bit more incognito. I disagree with that approach. Will see what will happen.
How did you determine that? The mentioned IP appears to belong to a US block and a trace seems to confirm that.
There is not much to report I am afraid, there is nothing wrong with that. Possibly misleading, yes. Illegal, no.
Trademarks really do not play a part here.
I will speak to someone from Google and forward him the info, regarding legality I think that misrepresenting someone as Google should count as illegal use of their trademark and also breach of the terms and conditions of Google Cloud.
There are three possibilities why someone tries to present itself as a Googlebot:
Google is testing something new.
This bot is doing something malicious or illegal, so that is why they hide themselves under the misrepresentation of Googlebot.
It is a legit bot, but doesn’t want people to suppress it and most people when see search engine bot automatically whitelist it.
From the screenshots I see France. I am not very familiar with Cloudflare, so you can give me more info on the subject.
That would be the data centre the request was routed through. That is an indicator it could be France or something in the vicinity (or maybe even an oversea territory) but does not necessarily mean it was any of them.
You can certainly forward it to Google for consideration but I’d be careful calling things “illegal” or “illegitimate”.
Can you point to where in the terms that is ruled out? To be honest I somewhat doubt that is the case. Thats a user agent, thats it. Following that logic Mozilla should sue Google too
Crawlers (including Google’s) can often be a nuisance, but there rarely is anything “illegal” about them.
It most likely is abuse of Google Cloud’s network to impersonate GoogleBot, but Google doesn’t have much of a way to tell if that’s happening since most traffic is tls (they do have passive monitoring to prevent abuse).
You might be able to get something out of the abuse form but don’t get your hopes up for a reply.
Same question would you have a source for that? I somewhat cant believe Google would dictate their customers what user agents they can and cannot send.
To shed at least a bit of light on this.
Searching for that IP - "34.76.251.191" - Google Search - does return several sites where the address is mentioned and checking their cached content it would appear as if this was a genuine Google crawler.
Looks like it’s really not owned by google via the whois 34.76.251.191
Comment: *** The IP addresses under this Org-ID are in use by Google Cloud customers ***
I can see people attempting to fake being a googlebot for implementations that either just check if an IP is owned by Google, or have googleusercontent included in the rdns “is this google” check.
As for the “is this permitted by google” I doubt they like it, but I can’t find any mention of impersonating them in their terms of services. They love blanked statements and it’s probably covered under one of the vague terms they have in there.
Thats a good point, however it still shows up on google.com for pages crawled yesterday. So they might currently be using that IP address.
I am not saying they like it . My sole point was it is not illegal and I dont really believe they care too much unless there is some proper abuse, in which case they will shut down your services for one of their other vague terms
Will see what Google say.
In the last year I’ve seen a bot misrepresenting as Google bot being used for illegal actions, including copyright violation.
It’s good there is Cloudflare to block things like that
This topic was automatically closed after 14 days. New replies are no longer allowed.