The (firewall) rules we follow

After seeing so many posts here on “How do I stop an attack” I’ve noticed some admins are calling “attack” what could simply be aggressive scans (either bad behaviour search bots or malicious scanners looking for vulnerabilities). I decided to post a new topic with an example of the firewall rules we use against such scans.



Some are self-explanatory but here we go:

  1. Allow PayPal IPN
  2. Allow a known, good IP address for testing
  3. Block requests on ports that aren’t 80 or 443
  4. Block TOR, and countries currently sanctioned by New Zealand or the UN or others in our block list
  5. Allow robots.txt for known bots (important because later we block some known bots that we don’t want anyway and they should access robots.txt to “learn” they are disallowed)
  6. Allow HSTS checks
  7. Automated traffic we don’t like (the list exists in another post but now using regex instead)
  8. Block anything that is not HEAD, GET, POST
  9. User agents that contain bot names but are obviously fake (wrong IP, country, etc)
  10. File used for traffic management but only for cloudflare
  11. Block sitemaps for non known-bots
  12. Block access to things we don’t have - wp-admin, phpadmin, umbraco folders, etc
  13. Block access to some other paths we don’t have
  14. Block Fake search requests
  15. Block additional SQL injection statements that Cloudflare is not blocking
  16. Block HTTP 1.0
  17. Block registration from some countries where a huge number of spammers come from
  18. Block threat score above 1
  19. Block Internet Explorer 9 and below plus a couple of metasploit UA
  20. Block attempt to use our image uploader with invalid parameter or request method
  21. Block access to a specific path unless with some parameter and request method
  22. We have some XHR calls with some parameters - block anything missing those
  23. Sometimes known bots try to follow a path they shouldn’t - block those
  24. Sometimes known bots try to load our search page - block those
  25. (inactive) In case of an attack we can block everyone from posting or logging in
  26. (inactive) In case of an attack we can block everyone from accessing the site if not in AU/NZ
  27. Allow known bots - this will allow most good bots that reached this point unless they were blocked before (which are the ones we don’t want)
  28. Block cloud services that are not known bots or not specific user agents (monitoring services)
  29. Challenge if request previously failed reCAPTCHA (logged in users)
  30. Challenge if request previously failed reCAPTCHA (all users)

Some rules are very specific to our use case. For example path or reCAPTCHA monitoring.

The regex we use to block bots we don’t want is a version of the posts here

https://community.cloudflare.com/t/top-50-user-agents-to-block/222594/2
https://community.cloudflare.com/t/top-50-user-agents-to-block/222594/3

(?i)appinsights|semrushbot|ahrefsbot|dotbot|whatcms|rogerbot|trendictionbot|blexbot|linkfluence|magpie-crawler|mj12bot|mediatoolkitbot|aspiegelbot|domainstatsbot|cincraw|nimbostratus|httrack|serpstatbot|omgili|grapeshotcrawler|megaindex|semanticbot|cocolyzebot|domcopbot|traackr|bomborabot|linguee|webtechbot|clickagy|sqlmap|internet-structure-research-project-bot|seekport|awariosmartbot|onalyticabot|buck|riddler|sbl-bot|df bot 1.0|pubmatic crawler bot|bvbot|sogou|barkrowler|embed.ly|semantic-visions|voluumdsp|wc-test-dev-bot|gulperbot|moreover|ltx71|accompanybot|mauibot|stormcrawler|moatbot|seznambot|zagbot|leikibot|ccbot|tweetmemebot|criteobot|paperlibot|livelapbot|slurp|sottopop|mail.ru_bot|rasabot|mbcrawler|anderspinkbot|zoominfobot|castlebot|linkdexbot|coccocbot|yacybot|isec_bot|flockbrain|bidswitchbot|csscheck|surdotlybot|contxbot|relemindbot|seokicks|futuribot|netpeakcheckerbot|yellowbrandprotectionbot|datagnionbot|kauaibot|zagbot|mixrankbot|uipbot|xforce|smtbot|pulno|zombiebot|dmasslinksafetybot|linkpadbot|aaabot|mtrobot|snappreviewbot|finditanswersbot|snappreviewbot|vebidoobot|coibotparser|dataforseobot|anybot|dingtalkbot|echobot|popscreen|vuhuvbot|sitecheckerbotcrawler|marketwirebot|hypestat|whatyoutypebot|whizebot|projectdiscovery|datenbutler|seebot.org|fuseonbot|vipnytt|pandalytics|ninjbot|gowikibot|360spider|acapbot|acoonbot|ahrefs|alexibot|asterias|attackbot|backdorbot|becomebot|binlar|blackwidow|blekkobot|blexbot|blowfish|bullseye|bunnys|butterfly|careerbot|casper|checkpriv|cheesebot|cherrypick|chinaclaw|choppy|clshttp|cmsworld|copernic|copyrightcheck|cosmos|crescent|cy_cho|datacha|diavol|discobot|dittospyder|dotbot|dotnetdotcom|dumbot|emailcollector|emailsiphon|emailwolf|extract|eyenetie|flaming|flashget|flicky|foobot|g00g1e|getright|gigabot|go-ahead-got|gozilla|grabnet|grafula|harvest|heritrix|httrack|icarus6j|jetbot|jetcar|jikespider|kmccrew|leechftp|libweb|linkextractor|linkscan|linkwalker|loader|masscan|miner|majestic|mechanize|mj12bot|morfeus|moveoverbot|netmechanic|netspider|nicerspro|nikto|ninja|nutch|octopus|pagegrabber|planetwork|postrank|proximic|purebot|pycurl|python|queryn|queryseeker|radian6|radiation|realdownload|rogerbot|scooter|seekerspider|semalt|siclab|sindice|sistrix|sitebot|siteexplorer|sitesnagger|skygrid|smartdownload|snoopy|sosospider|spankbot|spbot|sqlmap|stackrambler|stripper|sucker|surftbot|sux0r|suzukacz|suzuran|takeout|teleport|telesoft|true_robots|turingos|turnit|vampire|vikspider|voideye|webleacher|webreaper|webstripper|webvac|webviewer|webwhacker|winhttp|wwwoffle|woxbot|xaldon|xxxyy|yamanalab|yioopbot|youda|zeus|zmeu|zune|zyborg|lanaibot|metadataparser|go-http-client|daum|petalbot|yandexbot|brandwatch|libwww|guzzlehttp|expert-html|ias-va|geedobot|newspaper|censysinspect|peacockmedia|admantx|gobuster|crawling at home

In addition to those rules, we block a few ASN and IP addresses. The last 72 hours these rules together blocked 600k bad requests.

What this shows too is that if we relied only on the basic WAF without any custom rules our servers would be swamped with bad traffic.

It also shows that relying on user agents, IP or ASN alone won’t stop those scans from happening. You need an integrated approach and need to understand your traffic.

Some rules show low numbers on these screenshots but they exist because they are triggered at some point.

We constantly monitor our logs using Datadog and have some alarms set based on number of requests - anything that goes over the threshold is analysed and a rule is updated or a new rule put in place.

At application level we use lots of third party APIs to prevent some malicious traffic to happen - Stop Forum Spam database, disposable email validation, Google reCAPTCHA v3 and Perspective API.

I would love to replace Google reCAPTCHA with Cloudflare Bot Score being available in headers or firewall expressions - alas this is only available to enterprise clients and the Super Bot Fight Mode is a poor attempt at making something for the mid-tier online properties. Because SBFM cause so much problems I have it OFF for all use cases.

For these rules to work you should always make sure your server only accepts inbound connections from Cloudflare IPs listed here IP Ranges | Cloudflare - even better if you could block anything but these IPs at a firewall before it hits your server.

We never had to use rules 25 and 26. Cloudflare would kick in beforehand or some other rule would prevent the need for it. For example, this was a short attack prevented with these rules:

6 Likes

What % of your traffic would you say is bot traffic?

In my case

1 Like

Any slowness? I have wondered how performant the Firewall rules are. I’m sure it’s in the order of 1-5 ms in the best case but the worst case?

Nothing that has affected user experience.

1 Like

What is the website? I can run it through https://sitecheck.sucuri.net/ to check for any security issues with the site

Thanks for the offer. We already use Detectify.

The WAF rules serve two purposes: security (the obvious) but also performance by preventing tens of thousands and sometimes hundreds of thousands of unwanted requests from reaching our servers every day.

2 Likes

This topic was automatically closed 15 days after the last reply. New replies are no longer allowed.