Some User Agents completely bypassing Cloudflare proxy

I’m observe very unexplaning thing regarding user agents NetcraftSurveyAgent/1.0. and Palo Alto Networks company.

I have domane and subdomane names which are proxing through Cloudflare. And also I have WAF rules which are BLOCK some user agents including user agents mentioned above.

When I have checked logs on my origin server I have found the string as follows:

165.227.103.15 - - [04/Oct/2022:16:01:03 +0000] "GET / HTTP/1.0" 301 "-" "Mozilla/5.0 (compatible; NetcraftSurveyAgent/1.0; [email protected])"
198.235.24.41 - - [04/Oct/2022:19:10:24 +0000] "GET / HTTP/1.1" 301 "-" "Expanse, a Palo Alto Networks company, searches across the global IPv4 space multiple times per day to identify customers' presences on the Internet. If you would like to be excluded from our scans, please send IP addresses/domains to: [email protected]"

Well, for me I have found it seem very strange and I have decide to check in Cloudflare logs. And I’m was amazed when I didn’t find the same string in Cloudflare logs.
Another User Agents have successfully Blocked over Cloudflare rules without any obstacles and has the strings in log list.

So, I have a huge question - How is that possible?

I can frugal suppose that those User Agents completely bypass the proxy rules of Cloudflare or Cloudflare proxy has some bug regarding that user agents.

UPD1:
After carefuly logs analysis (CF logs and Origin server logs) I can make intermediate conclusion that requests (from NetcraftSurveyAgent/1.0. and Palo Alto Networks company) made through domane.name (СF proxy is ON) blocked by WAF rules completely.

If the requests (from NetcraftSurveyAgent/1.0. and Palo Alto Networks company) goes through sub.domane.name (СF proxy is ON) it looks like CF proxy is OFF. Other requests through sub.domane.name blocked by WAF rules completely.

Sounds like your origin isn’t secured properly - you shouldn’t be allowing inbound connections from anyone but Cloudflare.

It says in the user agent:

3 Likes

How this issue depend of my origin server and its security settings? The problem is Cloudflare bypassing incoming request (exactly requests from figured user agents) to my origin, about what the log says of my origin.

I’m not allow, but the issue has exist. As there are no many settings in Cloudflare of it.
Do you have any idea on the subject matter in question?

And what useful this should say me about?

Because they’re very likely connecting directly to your origin - nothing to do with Cloudflare.

I’d take an educated guess and say Cloudflare isn’t requesting your origin over HTTP/1.0.

That doesn’t really make much sense - what part of Cloudflare are they bypassing? If they hit a Cloudflare IP, they go through Cloudflare.

That it’ll scan all public IPv4 addresses - so if your origin isn’t secured, you will see the request?

2 Likes

Yes, correct! The general point, that it LOOKS LIKE the connection performed directly to my origin, BUT IT IS NOT POSSIBLE as I have in my Cloudflare account the next configuration:

It means that all incoming requests by domane.name have to be proxied! This rule works for all other requests to my origin completely excluding work for the mentioned (user agents) above in my question.

Not if they just went straight to the IP address.

I know it! But all requests has been performed over domane.name - not by IP address.

Like mentioned:

Expanse will request your IP - not your domain. Cloudflare is not involved for requests that go direct to your IP.

1 Like

Actually, you could written everything you wish and it not actually means that is true :))
And it clear confirms by the requests to my origin over domane.name and not over IP address!

:person_shrugging:

What’s the domain name?

Is that really necessary to share here of the real domane name?

There’s no other way for us to find out what’s happening without having the actual domain name to troubleshoot with.

If your website was not always proxied through Cloudflare, there will likely remain some knowledge of the origin IP address mapped to the domain name in the public record, e.g. from scanners such as https://www.shodan.io/

Cloudflare ensures that current traffic resolving to your domain will go through Cloudflare. However, any user or bot can hit the IP address if they know it and add the simple header to decide which domain the server should serve.

To stop these requests going directly to your server you will have to take steps to ensure your webserver does not respond to requests that are not from Cloudflare.

You can do this by filtering to their IP ranges: https://www.cloudflare.com/en-gb/ips/
And in addition you could go one step further by setting up Authenticated Origin Pulls which will allow your webserver to verify the request is coming from Cloudflare using a certificate: Authenticated origin pull · Cloudflare SSL/TLS docs

3 Likes

I don’t know how that works, but according of my point of view there are two variants:

  1. User agents or IPs/ASN of NetcraftSurveyAgent/1.0. and Palo Alto Networks company basicly bypass СF proxy through sub.domane.name according to CF settings or policies.
  2. CF has some bugs in their Proxy configurations.

Can I inform you my real domane.name using another way of communications?

I’ve checked my real domane.name and found nothing.

The bots you’re seeing are connecting directly by IP address, not using your domain name at all. The Expanse one specifically says that they scan the entire IPv4 space. You can bring up a new web server and never tell anyone about it and if it has a public IPv4 address you’ll get that bot connecting to it.

You can block requests from non-Cloudflare addresses to stop it.

1 Like

Are you definitely sure in it?
Well, how would you explain the direct request to my domane.name (see on the picture below)?
Are you still intended assert that the bot connect to any instance exceptional by IP address?

The screenshot from Cloudflare log list:

Yes. It specifically says that it does that, and every web server I’ve ever brought up gets hit by it.

They do that, too, and as you can see from that page, Cloudflare blocked it as you want. It’s working right. But they also connect directly to every single possible IPv4 address, which will completely bypass Cloudflare.

2 Likes

i40west, actually I had read well what there are written but the question a bit different!

So, I friendly advice you to read all topic (including all comments) before you’ll write the next comment. You will find all information about the question is.