Are google proxy requests really just a hidden bot network?

Finding a ton of google proxy ips looking for /.well-known/traffic-advice

ie:
wordpress_php7.4.error.log:[13-Apr-2022 12:50:19 UTC] sign up session try: 66.249.88 .223 | https://www.mydomain55.com/.well-known/traffic-advice
wordpress_php7.4.error.log:[13-Apr-2022 13:26:41 UTC] sign up session try: 66.249.88 .226 | https://mydomain15.com/.well-known/traffic-advice
wordpress_php7.4.error.log:[13-Apr-2022 13:56:40 UTC] sign up session try: 66.249.88 .90 | https://www.mydomain14.com/.well-known/traffic-advice
wordpress_php7.4.error.log:[13-Apr-2022 15:39:03 UTC] sign up session try: 66.249.88 .186 | https://www.mydomain12.com/.well-known/traffic-advice
wordpress_php7.4.error.log:[13-Apr-2022 16:37:40 UTC] sign up session try: 66.249.88 .46 | https://mydomain11.com/.well-known/traffic-advice
wordpress_php7.4.error.log:[13-Apr-2022 16:47:44 UTC] sign up session try: 66.249.88 .186 | https://www.mydomain10.com/.well-known/traffic-advice
wordpress_php7.4.error.log:[13-Apr-2022 18:03:42 UTC] sign up session try: 66.249.88 .186 | https://www.mydomain9.com/.well-known/traffic-advice
wordpress_php7.4.error.log:[13-Apr-2022 18:03:43 UTC] sign up session try: 66.249.88 .44 | https://mydomain8.com/.well-known/traffic-advice
wordpress_php7.4.error.log:[13-Apr-2022 18:28:42 UTC] sign up session try: 66.249.88 .95 | https://mydomain7.com/?fbclid=IwAR3P11kMjQDoqFZCIKzAgVwQ5JZ8XP2p2vGgCkMS7JVLK7Hq-XYsRiRTCM0
wordpress_php7.4.error.log:[13-Apr-2022 18:28:52 UTC] sign up session try: 66.249.88 .64 | https://mydomain6.com/
wordpress_php7.4.error.log:[13-Apr-2022 18:33:14 UTC] sign up session try: 66.249.88 .83 | https://mydomain5.com/.well-known/traffic-advice
wordpress_php7.4.error.log:[13-Apr-2022 19:35:52 UTC] sign up session try: 66.249.88 .186 | https://www.mydomain4.com/.well-known/traffic-advice
wordpress_php7.4.error.log:[13-Apr-2022 19:35:52 UTC] sign up session try: 66.249.88 .182 | https://www.mydomain1.com/.well-known/traffic-advice
wordpress_php7.4.error.log:[13-Apr-2022 19:36:02 UTC] sign up session try: 66.249.88 .9 | https://www.mydomain2.com/.well-known/traffic-advice
wordpress_php7.4.error.log:[13-Apr-2022 20:39:18 UTC] sign up session try: 66.249.88 .186 | https://www.mydomain3.com/.well-known/traffic-advice

running host 66.249.88.186 many times will result in an ip from china or egypt…traffic that should be blocked.

Is this traffic real or a bot network using google?

is there a known range of ips to block vs doing this guys suggestion:

( IP range applies to whole account vs having to burn a page rule and go to every single domain on the account )

is that guy’s suggestion safe?

Should Cloudflare, as part of their mission before allowing traffic to pass from a proxy being running host 66.(IPADRESS) on all google proxy and then apply the ruleset, it seems like this is an easy workaround of Cloudflare WAF if these are bad guys.

Okay thanks for any input from smart people or Cloudflare staff!

Looks like it’s this?

The User-Agent will supposedly be Chrome Privacy Preserving Prefetch Proxy.

yes it looks to be doing that. That said, if I run host 66.249.88.186 it returns:
186.88.249.66.in-addr.arpa domain name pointer google-proxy-66-249-88-186.google.com

If I geo locate 186.88.249.66 it now points to Venezuela. Venezuela should be challenged via country code setting however, Cloudflare i am guessing is not doing the host look up so then it allows traffic to bypass Cloudflare. Am i correct in thinking this? Which is a problem…

reading this…Cloudflare hopefully is on top of this as this could be a big problem as google moves forward with this which could make the geo targeting firewall capability unreliable if Cloudflare doesn’t pay attention to chrome and headless chrome browsers building proxy into their product.

Cloudflare uses GeoIP2 Databases Demo | MaxMind for geolocation - so if you notice inconsistencies with an IP address’s associated country, they should be your first port of call.

If it’s different in Cloudflare than it is in Maxmind, open a support ticket as per https://support.cloudflare.com/hc/en-us/articles/200168236-Configuring-Cloudflare-IP-Geolocation#12345683

I don’t think you understand the problem I am describing based on your response. It appears that ip’s that should be geo blocked are grabbing data via google’s proxy service. I think google makes the ip address of originating request available via the command line prompt: host <IP address of google proxy log.

Check the logs I left in the first post of this thread for more clarification. The point I am making the geo block occurs on the ip of the google proxy server, not on the originating IP which google is making available, but Cloudflare is not reviewing.

Where are you seeing that Google makes the IP address available?

The concept of the privacy preserving prefetch proxy is that websites can’t identify a user based on these calls.

The proposal defines the concept of a “private prefetch proxy” through the combination of an end-to-end encrypted CONNECT proxy to hide potentially identifiable information (e.g. user’s IP address), as well as rules governing its usage, and additional measures to ensure that the prefetches can not be personalized to the user.

In essence, they may be initiated by a user but they are intentionally not providing you with information to correlate it to a user - it’s a proxy, you can’t reliably block based on GeoIP.

You can follow the advice in private-prefetch-proxy/traffic-advice.md at main · buettner/private-prefetch-proxy · GitHub to disallow these calls to your website by configuring the traffic-advice file.

it begs the question then, is this an easy way to thwart part of Cloudflare’s firewall and should there be a “switch” in the free version of Cloudflare to simply handle google proxy requests to safely block it all or should Cloudflare be in contact with google chrome dev team to find a safe solution since chrome is so huge.

I just block most of the cloud providers, after allowing known bots. This way people using those services are blocked. You might want to use a rule to allow some known services you use (uptime monitoring, etc) otherwise there’s no reason to serve an agent that request thousands of pages per hour coming from a cloud service.

Hi @freitasm, I don’t know if you read the links but the big issue appears to be that chrome / google has a unique service to “pre-fetch” via their own proxy service (and not the clients computer) web pages to speed up perceived browsing.

What i am worried about is this being used via a headless chrome situation to create armies of bots to grab data without being able to be thwarted by the geo blocking, and then perform hard to detect probing to set up attacks.

The question is if its save via Cloudflare tools to block an ip range for my whole account rather than setting up a rule on each domain (which has that bot rule)

the ip range would be easier to maintain than going through all my domains.

Yes and part of my reply answers your question “Cloudflare tools to block an IP range for my whole account rather than setting up a rule on each domain (which has that bot rule) the IP range would be easier to maintain than going through all my domains.”

The problem is that the ASN used for the proxy service is also the same as used by valid Google bots - Googlebot, MediaPartners, etc. If you block the IP you will be chasing a lot of IPs. If you blog the ASN you block a lot of good bots.

The option is to create a rule to ALLOW known bots and then immediately block ASN from known cloud providers.

Of course if you still want the proxy service to be used then you would have to block the ASN with a filter for specific user agents.

Looking at my traffic, 33 requests out of 400,000 is not worth the work - I am just blocking it with the rest of the ASN.

What would be nice or useful if there was a way to apply a rule to all domains in my account (which is about 100) and not take up a page rule. A lot of the sites I’m working on are just little wordpress sites to help low low budget artists. 1. sitewide edge cache site, 2. security on wp-login.php 3. disable cache on cart path…and all page rules are gone.

I think the google proxy thing is gonna be a “problem” for alot more people as they move forward, and I think it would be good for Cloudflare to have a feature to directly address it.

This topic was automatically closed 15 days after the last reply. New replies are no longer allowed.