Best Method to Block Bad Bots

Good Day!

Essentially, there are five (5) methods for blocking and/or redirecting bad bots.

They are:

(1) Via .htaccess file
(2) Via plugin (e.g., SG Optimizer, Wordfence, Blackhole for Bad Bots)
(3) Via CDN (e.g., Cloudflare, KeyCDN)
(4) Via robots.txt file
(5) Via host server (built-in code)

In order to protect server resources and achieve the highest level of protection, which method would you recommend?

At a glance, the robots.txt file method seems the most logical way to go, but adding tens if not hundreds of bad bots to the robots.txt file doesn’t seem efficient.

Currently, we are blocking known bad bots via Cloudflare’s Firewall Rules and it seems to be working pretty good.

Note: We’re using CF’s Free Plan, so don’t have access to the Bot Mitigation protection offered by CF Pro and up.

Thoughts on this appreciated.

Thank you!

I can only pick one?

Ultimately, I use others to feed into #3 (Cloudflare Firewall Rules). As I discover Wordfence blocking requests, usually from their RBL, I often pull the ASN from that and put it into a Firewall Rule.

Speaking of Blackhole for Bad Bots, I was using Jeff Starr’s 7G firewall. From that, I was able to pull typical User Agent strings. Again, I put them into a Firewall Rule.

My goal is to block bad bots before they get to my server. But I like the endpoint security products to catch anything that slips through.

robots.txt, to me, is the least effective method. Bad bots don’t care about the robots.txt file. Which is kind of the point of Blackhole for Bad Bots.

1 Like

Hi @sdayman,

Yes, you can only pick one. :slight_smile:

Thanks for the reply. Perfect. We do the same, so it’s comforting to know others use the same (or similar) blocking method based on the philosophy you shared.

Question: Why do you use User Agent Strings for your Firewall Rules (in lieu of just using the bat bot names provided by the same sources you mentioned)? What did we miss?

Again, thank you!

1 Like

I’m not sure how else you would block a bad bot by name. I’d use a Firewall Rule to scan for a User Agent String that contains “naughtybot” and then Block or JS Challenge it.

Apologies. You are correct!

We also use “User Agent” under “Field” and the respective bot name under “Value” as noted here.

Cheers!

1 Like

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.