CF worker IP 2a06:98c0:3600::103 with Googlebot triggering ModSec rule

Hello,
I’m seeing a strange pattern in my logs where a Googlebot IP address, but also a Cloudflare IP address for workers (2a06:98c0:3600::103) triggers a ModSec rule. I would like to understand it, and resolve it if possible because this pattern is occuring approximately 600 times per day.

It goes like this (not necessarily in this order since all 3 log entries have the same time stamp):

  1. There is a request for a URL from, 66.249.77.104 with a 200 response. This is a Googlebot IP.

  2. There is a request for the same URL from, 2a06:98c0:3600::103 with a 403 response. This is an IP associated with Cloudflare workers.

  3. There is an Apache error entry (again for 2a06:98c0:3600::103):

—for example—
[client 2a06:98c0:3600::103] ModSecurity: [file “/etc/httpd/modsecurity.d/00_asl_z_searchengines.conf”] [line “105”] [id “303800”] [rev “5”] [msg “Atomicorp.com WAF Rules: Fake Googlebot webcrawler”] [data “”] [severity “ERROR”] Access denied with code 403 (phase 1). Lua Data: 2a06:98c0:3600::103 reverse and forward records did not match. [hostname “www.rockbrookcamp.com”] [uri “/blog/traits-beautiful-people/”] [unique_id “ZGjle-KrQ4sP2-p0kKAHrAAAANc”]

I should mention that the user agent for these requests (#1 and #2 above) is:
Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/112.0.5615.142 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)

I should also mention that the ModSecurity rules here are written by atomicorp and are hosted on our server.

Looking at this thread:
https://community.cloudflare.com/t/wrong-ip-without-worker-cf-connecting-ip-2a063600-103/404805/4
I thought this may be related to using Zaraz, but after turning it off, I still see this repeated error associated with the Cloudflare worker IP. I do not employ any other workers (that I know of).

In the error, “Lua Data: 2a06:98c0:3600::103 reverse and forward records did not match” makes me think this is related to this IP address “intentionally not identifying the original client’s IP” for some reason.
https://news.ycombinator.com/item?id=26690788
I’m not sure if that’s relevant.

Why is this Cloudflare IP address triggering this ModSec rule, and how is it related to the Googlebot?

Other than allowlisting that IP address (seems risky) on my server’s WAF, I’m not sure how to configure things better to avoid this repeated error.

Thanks!

The block message is correct. That’s not a legitimate Googlebot request to your site. Do not Allowlist that IP address.

That IP address means someone is most likely routing traffic through their account over to your site by using a Worker.

3 Likes

Yes. It is possible the rule is simply blocking a bad actor using a Cloudflare Worker, but I can’t explain the parallel nature of these entries. And the consistency and regularity of the pattern. The Googlebot IP is not being blocked, but the CF worker IP is. So the mystery arises because both are trying to get the same URL at the same time, one successful and the other not.

So do you mean… someone is using a Worker to send a fake Googlebot IP request and another (hidden in the Worker IP) IP request at the same time? That makes me wonder why that fake Googlebot IP is not being caught by the WAF.

Thank you for explaining.

2 Likes

Good point. What additional features or rules do you have enabled that would apply to those requests?

Well, nothing related to workers except Zaraz for Google Analytics 4. And as I mentioned, I toggled it off and saw no change. Argo is on. Early hints on. APO off. Verified Bots and Definitely Automated are both allowed. I can’t think of any rules that would be involved. Maybe something else?

I’m still puzzled how this Cloudflare Worker IP and the Googlbot IP can be related. For each instance of this, both entries have the Googlebot user agent when trying to get a URL, one blocked and one allowed.

The block is not happening in Cloudflare. It is the WAF at the server being triggered. And the rule blocking is complaining about a mismatch of “reverse and forward records.”

Many of Cloudflare’s features are actually Workers.

The only setting you mentioned that might be related is Early Hints, since that’s pretty much Chrome-specific (Google).

Turning off Early Hints has no effect.

Checking my logs, this apache error has been occurring at least since October 3. It may have been earlier since the log file probably rotated. It really looks like legitimate Googlebot traffic is somehow tied to traffic from 2a06:98c0:3600::103, which is then getting a 403 apache error from my server’s WAF.

Also throwing this out there since I’ve seen it in the past: what about Automatic Signed Exchanges, is that off too?

1 Like

I did have Automatic Signed Exchanges enabled, but after toggling it off, the issue remains.

I guess another way to ask is:

What Cloudflare features are (can be) tied to visits by Googlebot?

This domain is also setup for Cloudflare “Web Analytics.” Deactivating that feature made no difference and the issue remained.

Hi,

I had a similar issue with one of my domains. Same double requests, same requests coming from Known Bots (Googlebot, Twitterbot, Applebot etc.) arriving at the origin with Cloudflare Workers public IP. It was recently determined to have been caused by AMP Real URL. The issue went away immediately after turning this feature off.

In my case I had turned it on for testing, but soon gave up the idea of an AMP site and forgot to turn that feature off. For that reason, it was a no-brainer to just turn it off. If you use AMP Real URL in a production setting, and in case you test and confirm yours is also a result of AMP Real URL, you’d then need to evaluate whether it’s worth keeping it while Cloudflare fixes the issue for good.

The Cloudflare Worker that is behind AMP Real URL seems to somehow remove some of the request headers, even those added by a Transform Rule That’s how, in my case, I first learned there was a problem, as I have a TR that adds a header without which the origin will block the request. And that’s probably why mod_security is also blocking those requests, as the CF-Connecting-IP header is probably arriving at the origin empty.

4 Likes

Wow, that’s great info, and it makes good sense. But, ugh, I do not have AMP Real URL activated, yet I see this issue clearly and regularly.

Also tried turning off:
Early Hints
Automatic Signed exchanges

But to no effect.

I also have these speed features activated:
Enhanced HTTP/2 Prioritization
Mirage
Rocket Loader

Mysterious!

2 Likes

Indeed!

Would only affect HTTP/2 requests, which wasn’t the case with my domain.

Mirage only processes images, I wouldn’t bet on it (but, who knows?)

1 Like

With the help of Cloudflare support, the issue has been tied to Automatic Signed Exchanges (SXGs) feature. Working now on a possible explanation and (ideally) workaround.

4 Likes

Hi @rockbrookcamp,

I am also seeing the same requests from 2a06:98c0:3600::103, and I also have Automatic Signed Exchanges (SXGs) activated.

Did you ever find a solution?

Hello @janvitos

Yes, the Automatic Signed Exchanges feature is legitimately causing traffic on that IP6 address. For me, I was seeing a 403 response because that traffic was triggering a rule in the Atomicorp WAF installed on my server (not the Cloudflare WAF).

In other words, there was nothing that could be done in Cloudflare to fix this. In the end, I left SXG activated, and modified the server WAF a bit to avoid the 403 error.

Sorry that’s probably not much help!

2 Likes

@rockbrookcamp, at least I know why there’s Cloudflare traffic on that IP now :slight_smile:

Thanks for the info!

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.