I’m observe very unexplaning thing regarding user agents NetcraftSurveyAgent/1.0. and Palo Alto Networks company.
I have domane and subdomane names which are proxing through Cloudflare. And also I have WAF rules which are BLOCK some user agents including user agents mentioned above.
When I have checked logs on my origin server I have found the string as follows:
126.96.36.199 - - [04/Oct/2022:16:01:03 +0000] "GET / HTTP/1.0" 301 "-" "Mozilla/5.0 (compatible; NetcraftSurveyAgent/1.0; [email protected])"
188.8.131.52 - - [04/Oct/2022:19:10:24 +0000] "GET / HTTP/1.1" 301 "-" "Expanse, a Palo Alto Networks company, searches across the global IPv4 space multiple times per day to identify customers' presences on the Internet. If you would like to be excluded from our scans, please send IP addresses/domains to: [email protected]"
Well, for me I have found it seem very strange and I have decide to check in Cloudflare logs. And I’m was amazed when I didn’t find the same string in Cloudflare logs.
Another User Agents have successfully Blocked over Cloudflare rules without any obstacles and has the strings in log list.
So, I have a huge question - How is that possible?
I can frugal suppose that those User Agents completely bypass the proxy rules of Cloudflare or Cloudflare proxy has some bug regarding that user agents.
After carefuly logs analysis (CF logs and Origin server logs) I can make intermediate conclusion that requests (from NetcraftSurveyAgent/1.0. and Palo Alto Networks company) made through domane.name (СF proxy is ON) blocked by WAF rules completely.
If the requests (from NetcraftSurveyAgent/1.0. and Palo Alto Networks company) goes through sub.domane.name (СF proxy is ON) it looks like CF proxy is OFF. Other requests through sub.domane.name blocked by WAF rules completely.
How this issue depend of my origin server and its security settings? The problem is Cloudflare bypassing incoming request (exactly requests from figured user agents) to my origin, about what the log says of my origin.
I’m not allow, but the issue has exist. As there are no many settings in Cloudflare of it.
Do you have any idea on the subject matter in question?
Yes, correct! The general point, that it LOOKS LIKE the connection performed directly to my origin, BUT IT IS NOT POSSIBLE as I have in my Cloudflare account the next configuration:
It means that all incoming requests by domane.name have to be proxied! This rule works for all other requests to my origin completely excluding work for the mentioned (user agents) above in my question.
If your website was not always proxied through Cloudflare, there will likely remain some knowledge of the origin IP address mapped to the domain name in the public record, e.g. from scanners such as https://www.shodan.io/
Cloudflare ensures that current traffic resolving to your domain will go through Cloudflare. However, any user or bot can hit the IP address if they know it and add the simple header to decide which domain the server should serve.
To stop these requests going directly to your server you will have to take steps to ensure your webserver does not respond to requests that are not from Cloudflare.
The bots you’re seeing are connecting directly by IP address, not using your domain name at all. The Expanse one specifically says that they scan the entire IPv4 space. You can bring up a new web server and never tell anyone about it and if it has a public IPv4 address you’ll get that bot connecting to it.
Are you definitely sure in it?
Well, how would you explain the direct request to my domane.name (see on the picture below)?
Are you still intended assert that the bot connect to any instance exceptional by IP address?
Yes. It specifically says that it does that, and every web server I’ve ever brought up gets hit by it.
They do that, too, and as you can see from that page, Cloudflare blocked it as you want. It’s working right. But they also connect directly to every single possible IPv4 address, which will completely bypass Cloudflare.