Block Fake Googlebots

How can I Block Fake Googlebots?
Example: User Agent “Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.92 Safari/537.36”

Device: Chrome 81 Linux Linux x86_64 desktop
Example IP of course looks like Google SES 66.102.8.60

Cloudflare should do this by default. Do you see those logs in the dashboard?

If not, then good example would be to manually create a Firewall Rule as follows:

Using source articles:

Example of Firewall Rule to block fake Googlebots, allow only true ones:

(http.request.uri.path contains "sitemap.xml" and not ip.geoip.asnum in {15169}) or (http.request.uri.path contains "robots.txt" and ip.geoip.asnum ne 15169 and not http.user_agent contains "Googlebot") or (http.request.uri.path contains "robots.txt" and ip.geoip.asnum ne 15169 and not http.user_agent contains "APIs-Google") or (http.request.uri.path contains "robots.txt" and ip.geoip.asnum ne 15169 and not http.user_agent contains "Mediapartners-Google") or (http.request.uri.path contains "robots.txt" and ip.geoip.asnum ne 15169 and not http.user_agent contains "AdsBot-Google-Mobile") or (http.request.uri.path contains "robots.txt" and ip.geoip.asnum ne 15169 and not http.user_agent contains "AdsBot-Google") or (http.request.uri.path contains "robots.txt" and ip.geoip.asnum ne 15169 and not http.user_agent contains "Googlebot-Image") or (http.request.uri.path contains "robots.txt" and ip.geoip.asnum ne 15169 and not http.user_agent contains "Googlebot-News") or (http.request.uri.path contains "robots.txt" and ip.geoip.asnum ne 15169 and not http.user_agent contains "Googlebot-Video") or (http.request.uri.path contains "robots.txt" and ip.geoip.asnum ne 15169 and not http.user_agent contains "AdsBot-Google-Mobile-Apps") or (http.request.uri.path contains "robots.txt" and ip.geoip.asnum ne 15169 and not http.user_agent contains "FeedFetcher-Google") or (http.request.uri.path contains "robots.txt" and ip.geoip.asnum ne 15169 and not http.user_agent contains "Google-Read-Aloud") or (http.request.uri.path contains "robots.txt" and ip.geoip.asnum ne 15169 and not http.user_agent contains "DuplexWeb-Google") or (http.request.uri.path contains "robots.txt" and ip.geoip.asnum ne 15169 and not http.user_agent contains "Google Favicon") or (http.request.uri.path contains "robots.txt" and ip.geoip.asnum ne 15169 and not http.user_agent contains "Storebot-Google ") or (http.request.uri.path contains "sitemap" and not ip.geoip.asnum in {15169})

Hope it helps a bit :wink:

1 Like

I use a rule like this:

(http.user_agent matches “(?i)googlebot|bingbot|duckduckbot” and not cf.client.bot)

If you can’t use regex then you can use “contains” instead:

(http.user_agent contains “googlebot” or http.user_agent contains “bingbot”) and not cf.client.bot

2 Likes

This topic was automatically closed 15 days after the last reply. New replies are no longer allowed.