How to Minimize Bot Traffic

I’ve seen too many times on internet forums, Reddit, Facebook Groups, social media, even here, the never-ending questions from 5 years ago to 1 month ago, that I’m sure will still be asked again in the future; questions about how to prevent brute force, hacking / login attempts, or to minimize bot traffic. Most of the time the answer is always about recommending some WordPress security plug-ins. This is pretty basic actually, but let’s settle this once and for all.

Even though all you need is to use this Cloudflare Firewall Rule to block all access to the login page:

(http.request.uri.path contains “wp-admin” and not http.request.uri.path contains “/wp-admin/admin-ajax.php” and not http.request.uri.path contains “/wp-admin/theme-editor.php”) or (http.request.uri.path contains “wp-login”) … with “block” action.

You can add more [And] [URI Path] [contains] xmlrpc or wlwmanifest to that Rule; those two files are most often accessed by bots which trying to find weak spots in your WordPress site. Changing your login URL is 100% useless. Bots will still be able to sniff it.

I create a Firewall Rule on Cloudflare to managed-challenge (not block) traffic from hosting companies / data center ASN, with some exceptions of course. I put holes for some specific user-agent to get through. To find out the ASN of each IP address, you can use ipinfo.io, and to find out the bad IPs you can visit abuseipdb.com, or look at your server / visitor logs.

I’m aware this will also affect human visitors that come in with VPN. I don’t mind. That’s the consequence I would take to secure my small-low-budget site, even if they immediately leave because they feel annoyed by a few seconds of the Cloudflare’s Managed Challenge page, again, not a problem.

I’m on the Cloudflare free tier, with only 5 Firewall Rules + 50.000 IP Access Rules available. More than enough.

First, I created one Firewall Rule that called ASN Exceptions, I use this rule to poke exception holes in some ASNs. I added Amazon (16509, 14618, 8987), Microsoft (8075), and Google (396982) ASN. Then when in the Security Events I saw there was a user-agent that I wanted to allow, I added [And] [User Agent] [does not contain] admantx, for example. You should often check the Security Events if you want to do this, to ensure that no services / user-agents using Amazon, Microsoft, or Google clouds are blocked. You just gotta keep adding [And] [User Agent] [does not contain] as you want / need.

Second, since the Firewall Rules has limitations, a 4 kilobyte size per rule. You have to be efficient in using it. When you find a data center / hosting company ASN, just add that ASN to the IP Access Rules first (with managed challenge action), don’t put it in the Firewall Rules. But when you see in Security Events that the ASN is used by the service / user-agent that you want to allowlist, go ahead and move the ASN into the ASN Exceptions Firewall Rule. Example: I added AS30083 (GoDaddy) into IP Access Rules, and later on I saw on the Security Events page there was a user-agent admantx in AS30083. So I will move AS30083 into ASN Exceptions Firewall Rule. Including adding [And] [User Agent] [does not contain] admantax. If there is an ASN that you have added to the IP Access Rules and does not have the services / user agents that you need, then just leave it there, no need to move it to Firewall Rules. Like Chang Way AS57523, for example. That is one ■■■■ of a bad ASN. Keep that in the IP Access Rule.

Most people might say, “Why don’t just use Wordfence?” I did, but the result is still unsatisfactory and consuming too many server resources. In the past, I only used Wordfence for the Rate Limiting function, just in case. I even disable the Brute Force Protection & Scan feature. When I’ve been using Cloudflare Firewall Rule for a month, since I managed-challenge the hosting / data center ASNs, my Wordfence block count is zero. After giving some time to see how it goes, in the end, I removed Wordfence altogether.

Look, I’m not against or anti to any security plug-in. Using security plug-ins will drain your server resources. If there is another way that is just as effective but can save more resources to be used for other useful things, why not, right?

Traffic fell slightly but not significantly. I’ve read John Mu said that Google doesn’t filter 100% of bot traffic reports in Google Search Console and Analytics. Maybe this is the cause. Adsense income is also stable. Bounce Rate getting much better. Plus, maybe what I’m doing will be useful to minimize click fraud penalties from Google in the future.

Not only ASN from hosting / data center, I also challenged traffic from several ISP’s ASNs, I only challenged the HTTP/1.1 version, that’s where bots usually come, with this rule:

(ip.geoip.asnum in {174} and not http.request.uri.path in {“/wp-content/uploads/favicon.png” “/wp-content/uploads/og.png” “/favicon.ico” “/ads.txt”} and http.request.version in {“HTTP/1.0” “HTTP/1.1” “HTTP/1.2”} and not http.user_agent contains “Mastodon” and not http.user_agent contains “admantx” and not http.user_agent contains “proximic”)

Because even though the label is ISP, they operate a lot of servers too. Challenging every HTTP/1.1 (I think) is the best way to filter bot traffic from them. Please add ASN from any internet provider you want.

Another benefit that I see is that, usually Googlebot visits my site about 500 times a day. And since I did this, they visit 700 times a day. Maybe because there are more server resources to accommodate their visits, means crawl budget going up. Data from Cloudflare’s Top Crawlers Bots (Analytics & Logs > Security) is not much different from Crawl Stats on Google Search Console.

People who maintain server farms for spamming, scrapping, or trolling for vulnerabilities will definitely hate this idea. They will say something like, “That is a bad idea for your SEO,” or “It will bring a negative impact on user experience (VPN users).” As long as you make exceptions for search engine and other service UAs you want, I don’t see any issues related to the SERP. At least that’s what I’ve seen from my experiences.

When you use, in this order:

(http.user_agent contains “Google” and not ip.geoip.asnum in {15169 396982}) or (ip.geoip.asnum in {15169 396982} and not http.user_agent contains “Google” and not http.user_agent contains “FeedBurner” and not http.user_agent contains “Lighthouse” and not http.user_agent contains “IAB”)

… you will NOT interrupt any real Google crawler. But instead, you can prevent bots coming from Google Cloud Platform if you feel the need to stop them (like Palo Alto or Buck), and block fake Googlebot.

I also challenge URI Path which contains .php, with the exception (is not in) of my server IPs (IPv4 + IPv6 + IP for the cronjob) and my home internet IP. In many forums, including the Cloudflare Community, you will often see many opinions that it is best not to challenge .php because it will create errors on your site. No, it’s not, as long as you are allowlisted the IP of your server + your home internet + you also have to know the IP of the cronjob or the one used by the services / plugins that you use. For example, if you use Wordfence, you must allowlist their IP list.

Again, don’t forget to set whatever IP you want / need “allow” in IP Access Rule. ISP usually use dynamic IP and your home IP address will change periodically. Then you also have to allow the new IP and delete the old one.

Yes, Managed Challenge can be passed by robots, although it’s not an easy thing to do. I believe genius engineers at Cloudflare are no less clever than hackers who will try to bypass, but when you set a “block,” you can be sure it will not pass. Unless the hackers go straight direct to your server, then use this:

(http.host eq “domain.com” and not cf.edge.server_port in {80 443})

There was a moment where I thought, maybe if I let all traffic come to my site, whatever it is, from wherever, with any user-agent, maybe it would make my site’s visibility better across the internet, and it would improve SERP, number of visitors, advertising revenue … well I was wrong. It’s a different case if your site is managed by a company, with large funds, with a capable team.

1 Like

This topic was automatically closed 15 days after the last reply. New replies are no longer allowed.