How can I block 2a06:98c0:3600::103 at WAF level

When I add a rule like what you have, screenshot, making it the first to execute, I still see requests from that IP address:

$ sudo tail -f /var/log/nginx/access-example.com.log | grep ^2a06
2a06:98c0:3600::103 - - [24/Jun/2024:17:26:15 -0400] "GET /foo/bar/0 HTTP/2.0" 403 548 "-" "Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/125.0.6422.175 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
2a06:98c0:3600::103 - - [24/Jun/2024:17:26:15 -0400] "GET /foo/bar/1 HTTP/2.0" 403 548 "-" "Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/125.0.6422.175 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
2a06:98c0:3600::103 - - [24/Jun/2024:17:26:16 -0400] "GET /foo/bar/2 HTTP/2.0" 403 548 "-" "Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/125.0.6422.175 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"

The rest of the rules block specific IP addresses, some user agents, and the like. They don’t matter much since they are lower on the order of execution.

1 Like

Another indicator that it is not catching all cases, is that in the WAF dashboard, it shows that only 3 instances were blocked, though many hundreds of requests were made and got through in the past 7 minutes or so that I had the rule active.

Here is a screenshot.

Same. There is no way to block traffic from 2a06:98c0:3600::103 (coming from Cloudflare workers). That’s such a hole in the system.

I’ve also blocked the traffic at nginx level with 403. However, the traffic is big enough so that it’d be better to reaching nginx at all :/.

It has been 13 days since I implemented the rule that only has this line:

cf.worker.upstream_zone ne ""

The action is “Block”, and it is the first rule …

That rule was only activated 283 times in those 13 days. Here is a screenshot.

If I count how many times this IP address hit the site from the actual access log for Nginx, it shows 538,054 from 30th June to July 6th, and 549,388 for the 7 days before that.

So definitely, the above rule does not work.

We need a solution from Cloudflare, since this is a serious issue.

Could you also log the requested hostname for a few minutes and share the logs again?

I’ve also just tried the exact same rule, and it does work.
If the rule doesn’t work for you, it might be that the traffic doesn’t even go through your CF zone.

1 Like

Can you clarify what you mean by “log the requested hostname”?
Where? In Nginx?

Yes. Just add $http_host to the log file.

1 Like

Since this will require an Nginx restart/reload, it will have to wait till the weekend when traffic is lower, so as not to disrupt the site.

For the sake of completeness, I have this

real_ip_header CF-Connecting-IP;

set_real_ip_from 173.245.48.0/20;
set_real_ip_from 103.21.244.0/22;
set_real_ip_from 103.22.200.0/22;
set_real_ip_from 103.31.4.0/22;
set_real_ip_from 141.101.64.0/18;
set_real_ip_from 108.162.192.0/18;
set_real_ip_from 190.93.240.0/20;
set_real_ip_from 188.114.96.0/20;
set_real_ip_from 197.234.240.0/22;
set_real_ip_from 198.41.128.0/17;
set_real_ip_from 162.158.0.0/15;
set_real_ip_from 104.16.0.0/12;
set_real_ip_from 172.64.0.0/13;
set_real_ip_from 131.0.72.0/22;
set_real_ip_from 2400:cb00::/32;
set_real_ip_from 2606:4700::/32;
set_real_ip_from 2803:f800::/32;
set_real_ip_from 2405:b500::/32;
set_real_ip_from 2405:8100::/32;
set_real_ip_from 2a06:98c0::/29;
set_real_ip_from 2c0f:f248::/32;

This is in a file called cloudflare.conf that is included like this:

server {
  listen      443 ssl http2 default_server;
  listen [::]:443 ssl http2 ipv6only=on;

  include /etc/nginx/cloudflare.conf;

  server_name www.example.com;

  access_log /var/log/nginx/access-example.com.log statistics if=$do_logging;

  ...
  location / {
    # Deny CloudFlare IPv6 worker used by bots
    deny 2a06:98c0:3600::103;

    proxy_pass                         http://127.0.0.1:81;
    proxy_read_timeout                 240;
    proxy_connect_timeout              240;
    proxy_redirect                     off;

    proxy_set_header Host              $host;
    proxy_set_header X-Real-IP         $remote_addr;
    proxy_set_header X-Forwarded-For   $proxy_add_x_forwarded_for;
    proxy_set_header X-Forwarded-Proto https;
    proxy_set_header X-Forwarded-Port  443;

    proxy_buffers                      8 64k;
    proxy_buffer_size                  64k;
  }
}

A reload should not disrupt the site in any way.

https://nginx.org/en/docs/beginners_guide.html

1 Like

The $http_host shows that it is the domain of the site with www before it, like it should be.

I’m more thinking that there would be something that is overlooked in your (and/or @stan5’s) configuration somehow.

Can you also add $realip_remote_addr to your logging?

Looking at e.g. the default “combined” log format, according to Module ngx_http_log_module:

log_format combined '$remote_addr - $remote_user [$time_local] '
                    '"$request" $status $body_bytes_sent '
                    '"$http_referer" "$http_user_agent"';

The change could be as simple as e.g.:

log_format combined '$remote_addr via $realip_remote_addr - $remote_user [$time_local] '
                    '"$request" $status $body_bytes_sent '
                    '"$http_referer" "$http_user_agent"';

The logging format you shared above, e.g.:

would change to something like:

2a06:98c0:3600::103 via 192.0.2.123 - - [24/Jun/2024:17:26:15 -0400]

With this new logging format, you know that it would be the IP address 192.0.2.123, that told your nginx, that the original connection was from 2a06:98c0:3600::103.

That could eventually assist in tracing down where the actual issue is.

Alone this part of your configuration, would in my point of view be signalling a configuration mistake from your end.

104.16.0.0/12 was taken away in April 2021, and replaced with 104.16.0.0/13 and 104.24.0.0/14.

Maybe it would be time to automate the procedures to make sure that file is kept up-to-date, such as for example running a script every time the file modification time is older than 30 days.

Speaking of serious issues, -

The mess up with IP addresses above, is allowing 262’144 extra IPv4 addresses to spoof whichever source they want, through your nginx.

2 Likes

I changed the list of CF IPs to the latest ones that are published here for IPv4, and IPv6.

So now, the list looks like this in Nginx:

set_real_ip_from 173.245.48.0/20 ;
set_real_ip_from 103.21.244.0/22 ;
set_real_ip_from 103.22.200.0/22 ;
set_real_ip_from 103.31.4.0/22 ;
set_real_ip_from 141.101.64.0/18 ;
set_real_ip_from 108.162.192.0/18 ;
set_real_ip_from 190.93.240.0/20 ;
set_real_ip_from 188.114.96.0/20 ;
set_real_ip_from 197.234.240.0/22 ;
set_real_ip_from 198.41.128.0/17 ;
set_real_ip_from 162.158.0.0/15 ;
set_real_ip_from 104.16.0.0/13 ;
set_real_ip_from 104.24.0.0/14 ;
set_real_ip_from 172.64.0.0/13 ;
set_real_ip_from 131.0.72.0/22 ;
set_real_ip_from 2400:cb00::/32 ;
set_real_ip_from 2606:4700::/32 ;
set_real_ip_from 2803:f800::/32 ;
set_real_ip_from 2405:b500::/32 ;
set_real_ip_from 2405:8100::/32 ;
set_real_ip_from 2a06:98c0::/29 ;
set_real_ip_from 2c0f:f248::/32 ;

And these are some of the IP addresses when I log $realip_remote_addr.

They are all Cloudflare IP addresses …

172.68.22.16
172.68.22.180
172.68.23.77
172.71.147.152
172.71.147.92
172.71.150.198
172.71.150.252
172.71.150.253
172.71.150.45
172.71.150.61
172.71.151.77
1 Like

Could you try a very simple rule that blocks some random path and see if that works?

Something like (http.request.uri.path eq "/somerandompathname")

If the question is whether rules work or not, they do work …

For example, we added the following rule for a certain path.

(http.request.uri contains "/foo/bar" and ip.geoip.country eq "XX")

The path is a search URL on the site, and we have a valid country code.
And the action is JS Challenge.

We added that yesterday, and it already blocked some 290,000 requests.

1 Like

Any other ideas on this?

@Laudian
@DarkDeviL

Any other solutions to this?

On normal days, that bot hits the site hard twice a day around 8 am and 8 pm, causing the load average to shoot up to 4 X normal.

On other days, it does sustained crawling at crazy rates …

If you have other ideas to block it at the Cloudflare level, please share them …

  1. Are you still logging with $realip_remote_addr, so you see “{IP} via {PROXY_IP}” format?

  2. If so, does it still seem to be “2a06:98c0:3600::103 via {PROXY_IP}”?

Is it always this specific “User-Agent”, or is there any variance?

So the traffic is very specific to be at around 8 am and 8 pm, and not on other time intervals, such as e.g. around 11 am or 2 pm?

What time zone is the 8 am and 8 pm in?

One potential way, given the above, as the “User-Agent” contains “bot”, would be to block traffic that claims to be a bot, but where they aren’t also a verified bot, such as e.g.:

I would add this one in addition to the block worker rule, and the update of “set_real_ip”, so that they are all active, e.g. like this:

If that doesn’t help, I would go as far as to say, that to get something conclusive, we would need to dig much more in to the illegitimate traffic you have, and attempt to identify patterns that it have, but that legitimate traffic doesn’t.

2 Likes

Proxy IP:
I don’t keep logging of the proxy IP enabled all the time.

But when I enabled it, the proxy IPs are still Cloudflare’s

162.158.41.125
162.158.41.17
162.158.41.173
162.158.41.221
162.158.41.231
162.158.41.62
162.158.42.25
162.158.42.75
172.68.22.17
172.68.22.180
172.68.23.134
172.68.23.77
172.71.146.181
172.71.146.50
172.71.147.132
172.71.147.139
172.71.147.152
172.71.147.188
172.71.147.194
172.71.147.72
172.71.147.79
172.71.150.219
172.71.150.237
172.71.150.252
172.71.151.38
172.71.151.77

User-Agent:
The vast majority of requests have the full one posted before, with the MMB29P string.
That same string is used by genuine Googlebot, from Google IP addresses.

A minority has a simpler version, like this:

Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)

I think they are trying to claim they are Google, so sites would not block them.

Time Of Day
As for time of day, they are crawling the site all the time.
The rate of crawling varies, sometimes it is once every couple of seconds, sometimes it is 10 per second.
On quiet days (e.g. holidays and weekends), when the site is not otherwise busy, they sometimes have spikes around 8am and 8pm.

I will post your suggestion about the verified bot shortly.

Unfortunately, these bots (and other spam/malware) are always an arms race scenario.
They will change their tactics in a few months, and we have to find solutions.

For blocking unverified bots:

Just to confirm what “verified bot” mean: Cloudflare would use a allowlist of IP addresses (and perhaps other parameters), so things like Googlebots would not be blocked, but those that claim to be that are.

I did try the rule you suggested:

Here is a screenshot of the rule.

Not enough room on my screen to capture the bottom part. This rule is 1st.

I don’t have a “block worker” rule anymore since it did not make any difference since I implemented it.

Even with this new rule, the traffic from that IP address claiming it is Googlebot, is coming in normally as before.

Over the past 5 minutes since I implemented that rule, the dashboard says that 33 attempts were blocked, but I see 97 requests in Nginx’s access log.

So it did not work.

1 Like