Good list, thanks. I have deployed that but removed python and demon (those seem to block some RSS feedreaders, YMMV).
What I also have in place is this:
(http.user_agent contains “SemrushBot”) or (http.user_agent contains “AhrefsBot”) or (http.user_agent contains “DotBot”) or (http.user_agent contains “WhatCMS”) or (http.user_agent contains “Rogerbot”) or (http.user_agent contains “trendictionbot”) or (http.user_agent contains “BLEXBot”) or (http.user_agent contains “linkfluence”) or (http.user_agent contains “magpie-crawler”) or (http.user_agent contains “MJ12bot”) or (http.user_agent contains “Mediatoolkitbot”) or (http.user_agent contains “AspiegelBot”) or (http.user_agent contains “DomainStatsBot”) or (http.user_agent contains “Cincraw”) or (http.user_agent contains “Nimbostratus”) or (http.user_agent contains “HTTrack”) or (http.user_agent contains “serpstatbot”) or (http.user_agent contains “omgili”) or (http.user_agent contains “GrapeshotCrawler”) or (http.user_agent contains “MegaIndex”) or (http.user_agent contains “PetalBot”) or (http.user_agent contains “Semanticbot”) or (http.user_agent contains “Cocolyzebot”) or (http.user_agent contains “DomCopBot”) or (http.user_agent contains “Traackr”) or (http.user_agent contains “BomboraBot”) or (http.user_agent contains “Linguee”) or (http.user_agent contains “webtechbot”) or (http.user_agent contains “DomainStatsBot”) or (http.user_agent contains “Clickagy”) or (http.user_agent contains “sqlmap”) or (http.user_agent contains “Internet-structure-research-project-bot”) or (http.user_agent contains “Seekport”) or (http.user_agent contains “AwarioSmartBot”) or (http.user_agent contains “OnalyticaBot”) or (http.user_agent contains “Buck”) or (http.user_agent contains “Riddler”) or (http.user_agent contains “SBL-BOT”) or (http.user_agent contains “DF Bot 1.0”) or (http.user_agent contains “PubMatic Crawler Bot”) or (http.user_agent contains “BVBot”) or (http.user_agent contains “Sogou”) or (http.user_agent contains “Barkrowler”)
This list blocks about 20k (some days up to 50k) requests daily. Note that some on this list will block SEO services (no big deal if you are not the one requesting scans) and some social media monitoring services.
A company that keeps a presence on our forums mentioned their CRM-based monitoring tool stopped providing reports - and the company providing the service wouldn’t disclose the BOT name because it is a “trade secret”… Well, if their “trade secret” costs me money in terms of server and network resources, I don’t see why I would let them make money by selling data harvested from my services without a good justification.