Exclude subdomain from WAF for blocking bots

We have a domain with many subdomains, all of which are routed through Cloudflare’s LB in front of a multi-node cluster.

We have in total more than 1 million visits per month for all of the subdomains.

We would like to use Cloudflare’s WAF and bot detection, however the way 1 of the subdomains work makes it impossible because the bot detection will block all the good bots from crawling that subdomain too.

The reason for this problem is that this subdomain is in a way a backend for 300 other domains which have been set up on a secondary server, using an NGINX reverse proxy, which receives the requests for each domain, and proxies the requests to this single subdomain, and using a custom header, servers the appropriate content.

The problem is because the requests which go through the proxy server, are arriving at Cloudflare having the proxy server’s IP address, the good bots are detected at fake bots by the firewall and are blocked.

Considering that the proxy server’s IP address is known at all times, and we can also add a “secret” custom header to the requests coming from the proxy server, it should pretty safe to tell Cloudflare not to block those requests.

I have tried the Transform Rules, trying to change the request headers with the actual origin’s IP, but the tool doesn’t allow modifying the needed headers.

I have also tried the Page Rules, trying to disable the security and WAF, but the bot detection works independently and doesn’t get disabled through the Page Rules.

As a last attempt, I have upgraded a second account to Business Plan in hopes of being able to convert that account into a Partial DNS setup and point only the requests for the subdomain to the second account and disable the firewall completely, but my whole setup got messed up and the LB stopped working.
And it looks like the whole domain has been moved from the first original account to the secondary account and I had to set up the whole Load Balancing again on the secondary account for the domain to start working again.

I have also created a support ticket since 5 days ago and I haven’t received any support so far.

Is there anything more that I can test to make this work?

Perhaps this new-new product can help you:

1 Like

@cbrandt Thanks for pointing out this new feature.

If I understand correctly, that can work at the same level of excluding the subdomain from the security checks and if that way, the bot fighting features will be completely disabled on all 300 domains pointing to this subdomain, right?

I think it’s relatively easier and more maintainable than the partial DNS solution and that’s good, but we still miss the big benefit of the Cloudflare’s security.

The optimal solution for us would be to be able to modify the request headers and reconstruct them in the way we received them on the proxy server.

From https://developers.cloudflare.com/rules/transform/request-header-modification/ “Currently, there is a limited number of HTTP request headers that you cannot modify. Cloudflare may remove restrictions for some of these HTTP request headers when presented with valid use cases.”

That would be our ultimate choice. How can we request Cloudflare to disable this limitation for this specific subdomain, for the requests coming from a specific IP address?
We can even mark the requests with an agreed upon “secret” header for more security.

CC @fritex since you reacted to the previous post.

I’m not so sure you’re trying to bypass Bot Fight Mode or WAF. You can certainly bypass WAF using a Firewall Rule with the action Bypass. But Bot Fight Mode is (or at least was until today’s new Friendly Bots thing) kinda rough around the edges. Super Bot Fight Mode, available since you have Business Plan, allows for certain fine grain controls, but I’m not sure your suggestion (a header-based solution) could be applied here.

In any case, you said you’ve opened a ticket with support, so please post the ticket number here so that we can escalate it .

@cbrandt I have tried the WAF bypass and it didn’t help. I have also briefly looked into the Super Bot Fight Mode, but I couldn’t find anything that would look like a solution there either, unless I have missed something.

I checked my original ticket and it looks a bit unclear now that I read it again, so I have created a new ticket and made it as close as possible to the details from this thread with more details.

The new ticket’s number is 2401201.

Thank you very much in advance.

Why not simply have server A talk to server B directly for content? Why place Cloudflare in the middle of the communication stream of your 2 tier architecture?

2 Likes

@cscharff It’s a multi-node cluster behind Cloudflare’s Load Balancer that the subdomain points to.

An A to B 2 tier setup is simply not possible.

This is an awkward architecture to try and leverage Cloudflare’s security services. They aren’t really designed to be behind a proxy.

You’d likely be better served by rearchitecting the FE to use Cloudflare’s SSL for SaaS and ditching the current frontend doing rewriting.

1 Like

@cscharff The number of domains to be modified is more than 300 and on the current setup, the NGINX configuration for all of them easily be modified/updated using a single script.

In your suggested scenario, we will either need a lot more coding to achieve the same thing using Cloudflare’s APIs or we will have to go through each one of the domains (spread across multiple Cloudflare accounts) and apply the required configuration.

Then I’d look at disabling the security features of you can’t change the architecture.

If you own all these domains you could use workers on the edge instead of SSL for SaaS I suppose, but the current architecture is going to introduce a number of limitations around security because it’s not what Cloudflare was designed for.

1 Like

I understand that.

However, there is a fixed condition in place. The requests are coming from a fixed IP address, and all of them carry the IP of the original request (client).

If I would be able to modify the x-forwarded-for, true-client-ip, x-real-ip headers, and set them to the original client’s IP, then the User-Agent together with the IP address will match for the “good bots” and they won’t be blocked, theoritically.

Create a net new domain, disable security features on that domain. Point the load balancer to the origins, modify your app to forward the requests to the new hosts.

We’ve now pretty much exhausted the absolve options on the platform as it exists today. If none of those are acceptable you may need to consider a new platform

1 Like

@cscharff is this your personal opinion or is it the official response from the Cloudflare’s support team?

Issue is entirely on your Nginx proxy /Nginx origin configuration side then. If you have an Nginx proxy between CF and your Nginx origin site, make sure you have real_ip_recursive enabled Module ngx_http_realip_module on Nginx origin servers so that the proper real visitor IP is passed onto your origin server. Should work provided you have properly configured all set_real_ip_from trusted proxy addresses

Syntax: real_ip_recursive on off ;
Default: real_ip_recursive off;
Context: http , server , location

This directive appeared in versions 1.3.0 and 1.2.1.

If recursive search is disabled, the original client address that matches one of the trusted addresses is replaced by the last address sent in the request header field defined by the real_ip_header directive. If recursive search is enabled, the original client address that matches one of the trusted addresses is replaced by the last non-trusted address sent in the request header field.

@eva2000 thank you for pointing this out.
This sure sounds like a possible solution.
I will try it in a couple of hours and update you.

Yeah it’s something that not many Nginx users are aware of it seems :smiley:

Just to clarify, the real_ip_recursive directive needs to be enabled on Nginx origin server side assuming you’re using Nginx origin servers.

Neither.

This topic was automatically closed 15 days after the last reply. New replies are no longer allowed.