I’d like to expose the subscriber only content to the Google Bot.
I’d therefore like to be able to determine if the client is indeed the Google Bot or not, in order to grant or not to grant access.
Currently Cloudflare provides the possibility in a firewall to tell if the Bot is a known good bot (cf.client.bot); how can I use this information in my application?
Is there a HTTP header where this information is stored (i.e.: the real client IP is in the CF-Connecting-IP header)?
A workaround would probably be to block all request with “Googlebot” in it which are not a “good known bot” and enable the subscriber only content for all resulting user agents with “Googlebot” but I don’t really like that solution. Any ideas?
I hoped that instead of verifying it myself, that could be done directly by CloudFlare.
I’ve then noticed that on my domains we’ve already enabled the “Fake Google Bot” detection (WAF: Cloudflare Specials -> Rule id 100201 .
I’ve verified our logs and indeed all requests coming with a user agent “googlebot” seem to be legitimate.
I’ve performed some quick test to verify the behavior and indeed my requests get blocked.
I think that we can then assume that every request with the user agent “googlebot” is a legitimate one, so I guess the application could just rely on that info. What do you think
By the way, the info on the Google page you linked (and which I had also found) seem to be imprecise.
Among the list of IPs I’ve retrieved from our logs, there also some which don’t match neither .googlebot.com nor .google.com.
For example, 107.178.231.94 and 130.211.96.77 both belong to Google, but their reverse DNS entry point to: something.bc.googleusercontent.com (so neither googlebot.com nor .google.com).