An updated version after a few years of the original reply above. I’ve broken down this into four rules. You may or may not use all of them but you will see some common user-agent strings.
The reason this is in four rules is because only Business Plans can use regex, therefore the repetition of “lower” and “contains” causes the rules to use a lot of characters.
Rule #1
lower(http.user_agent) contains “semrushbot”
or lower(http.user_agent) contains “ahrefsbot”
or lower(http.user_agent) contains “dotbot”
or lower(http.user_agent) contains “whatcms”
or lower(http.user_agent) contains “rogerbot”
or lower(http.user_agent) contains “blexbot”
or lower(http.user_agent) contains “linkfluence”
or lower(http.user_agent) contains “mj12bot”
or lower(http.user_agent) contains “aspiegelbot”
or lower(http.user_agent) contains “domainstatsbot”
or lower(http.user_agent) contains “cincraw”
or lower(http.user_agent) contains “nimbostratus”
or lower(http.user_agent) contains “httrack”
or lower(http.user_agent) contains “serpstatbot”
or lower(http.user_agent) contains “megaindex”
or lower(http.user_agent) contains “semanticbot”
or lower(http.user_agent) contains “cocolyzebot”
or lower(http.user_agent) contains “domcopbot”
or lower(http.user_agent) contains “traackr”
or lower(http.user_agent) contains “bomborabot”
or lower(http.user_agent) contains “linguee”
or lower(http.user_agent) contains “webtechbot”
or lower(http.user_agent) contains “clickagy”
or lower(http.user_agent) contains “sqlmap”
or lower(http.user_agent) contains “internet-structure-research-project-bot”
or lower(http.user_agent) contains “seekport”
or lower(http.user_agent) contains “awariosmartbot”
or lower(http.user_agent) contains “onalyticabot”
or lower(http.user_agent) contains “buck”
or lower(http.user_agent) contains “riddler”
or lower(http.user_agent) contains “sbl-bot”
or lower(http.user_agent) contains “df”
or lower(http.user_agent) contains “pubmatic”
or lower(http.user_agent) contains “bot”
or lower(http.user_agent) contains “bvbot”
or lower(http.user_agent) contains “sogou”
or lower(http.user_agent) contains “barkrowler”
or lower(http.user_agent) contains “embed.ly”
or lower(http.user_agent) contains “semantic-visions”
or lower(http.user_agent) contains “voluumdsp”
or lower(http.user_agent) contains “wc-test-dev-bot”
or lower(http.user_agent) contains “gulperbot”
or lower(http.user_agent) contains “moreover”
or lower(http.user_agent) contains “ltx71”
or lower(http.user_agent) contains “accompanybot”
or lower(http.user_agent) contains “mauibot”
or lower(http.user_agent) contains “moatbot”
or lower(http.user_agent) contains “seznambot”
or lower(http.user_agent) contains “zagbot”
or lower(http.user_agent) contains “ccbot”
or lower(http.user_agent) contains “tweetmemebot”
or lower(http.user_agent) contains “paperlibot”
or lower(http.user_agent) contains “livelapbot”
or lower(http.user_agent) contains “slurp”
or lower(http.user_agent) contains “sottopop”
or lower(http.user_agent) contains “mail.ru_bot”
or lower(http.user_agent) contains “rasabot”
or lower(http.user_agent) contains “anderspinkbot”
or lower(http.user_agent) contains “zoominfobot”
or lower(http.user_agent) contains “castlebot”
or lower(http.user_agent) contains “linkdexbot”
or lower(http.user_agent) contains “coccocbot”
or lower(http.user_agent) contains “yacybot”
or lower(http.user_agent) contains “isec_bot”
or lower(http.user_agent) contains “flockbrain”
or lower(http.user_agent) contains “csscheck”
or lower(http.user_agent) contains “surdotlybot”
or lower(http.user_agent) contains “contxbot”
or lower(http.user_agent) contains “relemindbot”
or lower(http.user_agent) contains “pulno”
or lower(http.user_agent) contains “zombiebot”
or lower(http.user_agent) contains “dmasslinksafetybot”
or lower(http.user_agent) contains “linkpadbot”
or lower(http.user_agent) contains “aaabot”
or lower(http.user_agent) contains “mtrobot”
or lower(http.user_agent) contains “snappreviewbot”
or lower(http.user_agent) contains “scrapy”
or lower(http.user_agent) contains “bytespider”
or lower(http.user_agent) contains “bytedance”
or lower(http.user_agent) contains “mozlila”
or lower(http.user_agent) contains “nessus”
Rule #2
lower(http.user_agent) contains “binlar”
or lower(http.user_agent) contains “blackwidow”
or lower(http.user_agent) contains “blekkobot”
or lower(http.user_agent) contains “blexbot”
or lower(http.user_agent) contains “blowfish”
or lower(http.user_agent) contains “bullseye”
or lower(http.user_agent) contains “bunnys”
or lower(http.user_agent) contains “butterfly”
or lower(http.user_agent) contains “careerbot”
or lower(http.user_agent) contains “casper”
or lower(http.user_agent) contains “checkpriv”
or lower(http.user_agent) contains “cheesebot”
or lower(http.user_agent) contains “cherrypick”
or lower(http.user_agent) contains “chinaclaw”
or lower(http.user_agent) contains “choppy”
or lower(http.user_agent) contains “clshttp”
or lower(http.user_agent) contains “cmsworld”
or lower(http.user_agent) contains “copernic”
or lower(http.user_agent) contains “copyrightcheck”
or lower(http.user_agent) contains “cosmos”
or lower(http.user_agent) contains “crescent”
or lower(http.user_agent) contains “cy_cho”
or lower(http.user_agent) contains “datacha”
or lower(http.user_agent) contains “diavol”
or lower(http.user_agent) contains “discobot”
or lower(http.user_agent) contains “dittospyder”
or lower(http.user_agent) contains “dotbot”
or lower(http.user_agent) contains “dotnetdotcom”
or lower(http.user_agent) contains “dumbot”
or lower(http.user_agent) contains “emailcollector”
or lower(http.user_agent) contains “emailsiphon”
or lower(http.user_agent) contains “emailwolf”
or lower(http.user_agent) contains “extract”
or lower(http.user_agent) contains “eyenetie”
or lower(http.user_agent) contains “seokicks”
or lower(http.user_agent) contains “futuribot”
or lower(http.user_agent) contains “netpeakcheckerbot”
or lower(http.user_agent) contains “yellowbrandprotectionbot”
or lower(http.user_agent) contains “datagnionbot”
or lower(http.user_agent) contains “uptime-kuma”
or lower(http.user_agent) contains “peer39”
or lower(http.user_agent) contains “crawler”
or lower(http.user_agent) contains “claudebot”
or lower(http.user_agent) contains “fidget”
or lower(http.user_agent) contains “my-tiny”
Rule #3
lower(http.user_agent) contains “flaming”
or lower(http.user_agent) contains “flashget”
or lower(http.user_agent) contains “flicky”
or lower(http.user_agent) contains “foobot”
or lower(http.user_agent) contains “g00g1e”
or lower(http.user_agent) contains “getright”
or lower(http.user_agent) contains “gigabot”
or lower(http.user_agent) contains “go-ahead-got”
or lower(http.user_agent) contains “gozilla”
or lower(http.user_agent) contains “grabnet”
or lower(http.user_agent) contains “grafula”
or lower(http.user_agent) contains “harvest”
or lower(http.user_agent) contains “heritrix”
or lower(http.user_agent) contains “httrack”
or lower(http.user_agent) contains “icarus6j”
or lower(http.user_agent) contains “jetbot”
or lower(http.user_agent) contains “jetcar”
or lower(http.user_agent) contains “jikespider”
or lower(http.user_agent) contains “kmccrew”
or lower(http.user_agent) contains “leechftp”
or lower(http.user_agent) contains “libweb”
or lower(http.user_agent) contains “linkextractor”
or lower(http.user_agent) contains “linkscan”
or lower(http.user_agent) contains “linkwalker”
or lower(http.user_agent) contains “loader”
or lower(http.user_agent) contains “masscan”
or lower(http.user_agent) contains “miner”
or lower(http.user_agent) contains “majestic”
or lower(http.user_agent) contains “mechanize”
or lower(http.user_agent) contains “morfeus”
or lower(http.user_agent) contains “moveoverbot”
or lower(http.user_agent) contains “netmechanic”
or lower(http.user_agent) contains “netspider”
or lower(http.user_agent) contains “nicerspro”
or lower(http.user_agent) contains “nikto”
or lower(http.user_agent) contains “ninja”
or lower(http.user_agent) contains “nutch”
or lower(http.user_agent) contains “octopus”
or lower(http.user_agent) contains “pagegrabber”
or lower(http.user_agent) contains “planetwork”
or lower(http.user_agent) contains “postrank”
or lower(http.user_agent) contains “purebot”
or lower(http.user_agent) contains “pycurl”
or lower(http.user_agent) contains “python”
or lower(http.user_agent) contains “queryn”
or lower(http.user_agent) contains “queryseeker”
or lower(http.user_agent) contains “radian6”
or lower(http.user_agent) contains “radiation”
or lower(http.user_agent) contains “realdownload”
or lower(http.user_agent) contains “rogerbot”
or lower(http.user_agent) contains “scooter”
or lower(http.user_agent) contains “seekerspider”
or lower(http.user_agent) contains “semalt”
or lower(http.user_agent) contains “siclab”
or lower(http.user_agent) contains “sindice”
or lower(http.user_agent) contains “sistrix”
or lower(http.user_agent) contains “sitebot”
or lower(http.user_agent) contains “siteexplorer”
or lower(http.user_agent) contains “sitesnagger”
or lower(http.user_agent) contains “skygrid”
or lower(http.user_agent) contains “smartdownload”
or lower(http.user_agent) contains “snoopy”
or lower(http.user_agent) contains “sosospider”
or lower(http.user_agent) contains “spankbot”
or lower(http.user_agent) contains “spbot”
or lower(http.user_agent) contains “sqlmap”
or lower(http.user_agent) contains “stackrambler”
or lower(http.user_agent) contains “stripper”
or lower(http.user_agent) contains “sucker”
or lower(http.user_agent) contains “surftbot”
or lower(http.user_agent) contains “finditanswersbot”
or lower(http.user_agent) contains “snappreviewbot”
or lower(http.user_agent) contains “vebidoobot”
or lower(http.user_agent) contains “coibotparser”
or lower(http.user_agent) contains “anybot”
or lower(http.user_agent) contains “dingtalkbot”
or lower(http.user_agent) contains “echobot”
or lower(http.user_agent) contains “popscreen”
or lower(http.user_agent) contains “vuhuvbot”
or lower(http.user_agent) contains “marketwirebot”
or lower(http.user_agent) contains “hypestat”
or lower(http.user_agent) contains “whatyoutypebot”
or lower(http.user_agent) contains “mixrankbot”
or lower(http.user_agent) contains “xforce”
or lower(http.user_agent) contains “smtbot”
or lower(http.user_agent) contains “thesis-research-bot”
Rule #4
lower(http.user_agent) contains “sux0r”
or lower(http.user_agent) contains “suzukacz”
or lower(http.user_agent) contains “suzuran”
or lower(http.user_agent) contains “takeout”
or lower(http.user_agent) contains “teleport”
or lower(http.user_agent) contains “telesoft”
or lower(http.user_agent) contains “true_robots”
or lower(http.user_agent) contains “turingos”
or lower(http.user_agent) contains “vampire”
or lower(http.user_agent) contains “vikspider”
or lower(http.user_agent) contains “voideye”
or lower(http.user_agent) contains “webleacher”
or lower(http.user_agent) contains “webreaper”
or lower(http.user_agent) contains “webstripper”
or lower(http.user_agent) contains “webvac”
or lower(http.user_agent) contains “webviewer”
or lower(http.user_agent) contains “webwhacker”
or lower(http.user_agent) contains “winhttp”
or lower(http.user_agent) contains “wwwoffle”
or lower(http.user_agent) contains “woxbot”
or lower(http.user_agent) contains “xaldon”
or lower(http.user_agent) contains “xxxyy”
or lower(http.user_agent) contains “yamanalab”
or lower(http.user_agent) contains “yioopbot”
or lower(http.user_agent) contains “youda”
or lower(http.user_agent) contains “zeus”
or lower(http.user_agent) contains “zmeu”
or lower(http.user_agent) contains “zune”
or lower(http.user_agent) contains “zyborg”
or lower(http.user_agent) contains “lanaibot”
or lower(http.user_agent) contains “metadataparser”
or lower(http.user_agent) contains “go-http-client”
or lower(http.user_agent) contains “daum”
or lower(http.user_agent) contains “yandexbot”
or lower(http.user_agent) contains “libwww”
or lower(http.user_agent) contains “guzzlehttp”
or lower(http.user_agent) contains “expert-html”
or lower(http.user_agent) contains “geedobot”
or lower(http.user_agent) contains “newspaper”
or lower(http.user_agent) contains “peacockmedia”
or lower(http.user_agent) contains “gobuster”
or lower(http.user_agent) contains “expanseinc”
or lower(http.user_agent) contains “crawling”
or lower(http.user_agent) contains “tineye”
or lower(http.user_agent) contains “damieng”
or lower(http.user_agent) contains “scpitspi”
or lower(http.user_agent) contains “screaming”
or lower(http.user_agent) contains “babbar”
or lower(http.user_agent) contains “scalaj”
or lower(http.user_agent) contains “turnitin”
or lower(http.user_agent) contains “blackbox”
or lower(http.user_agent) contains “okhttp”
or lower(http.user_agent) contains “acebookexternalhit”
or lower(http.user_agent) contains “externalhit”
or lower(http.user_agent) contains “dataforseo”
or lower(http.user_agent) contains “semrush”
or lower(http.user_agent) contains “contentking”
or lower(http.user_agent) contains “siteauditbot”
or lower(http.user_agent) contains “botify”
or lower(http.user_agent) contains “cxense”
or lower(http.user_agent) contains “revvim”
or lower(http.user_agent) contains “colly”
or lower(http.user_agent) contains “github”
or lower(http.user_agent) contains “img2dataset”
or lower(http.user_agent) contains “petalbot”
or lower(http.user_agent) contains “whizebot”
or lower(http.user_agent) contains “projectdiscovery”
or lower(http.user_agent) contains “datenbutler”
or lower(http.user_agent) contains “seebot.org”
or lower(http.user_agent) contains “fuseonbot”
or lower(http.user_agent) contains “vipnytt”
or lower(http.user_agent) contains “pandalytics”
or lower(http.user_agent) contains “ninjbot”
or lower(http.user_agent) contains “gowikibot”
or lower(http.user_agent) contains “360spider”
or lower(http.user_agent) contains “acapbot”
or lower(http.user_agent) contains “acoonbot”
or lower(http.user_agent) contains “ahrefs”
or lower(http.user_agent) contains “alexibot”
or lower(http.user_agent) contains “asterias”
or lower(http.user_agent) contains “attackbot”
or lower(http.user_agent) contains “backdorbot”
or lower(http.user_agent) contains “becomebot”
or lower(http.user_agent) contains “gptbot”
or lower(http.user_agent) contains “kauaibot”
or lower(http.user_agent) contains “zagbot”
Bonus rule: I challenge some ASNs because of the low-quality traffic. This blocks more bots using unknown user-agents. The solve rate for this is very low, so it should be safe to block. Make sure to use a rule to allow known bots before using this rule below as it will block some good spiders if you don’t allow them first.
(ip.geoip.asnum in {7224 15169 8074 8075 12076 16509 64236 14061 12876 20473 31898 138915 5650 396982 136907 44185 3356 25 680 136557 203999 46851 262544 10103 36352 210644 31404 202422 206092 24940 9009 36903 137409 211252 4134 45609 7713 36947 4837 23969 18403 24560 395662 20450 30235 47205 23881 198047 14986 17920 32275 50608 199213 262170 201862 43541 24381 10200 14708 27229 48093 42465 7598 30475 55229 7349 33251 52465 52270 45152 8477 198153 52925 61412 262978 53225 41427 53101 41369 35467 59554 52674 24611 48812 40715 201449 52321 29331 201709 53221 198432 51241 19969 56799 26277 58113 28333 42120 6718 20692 17439 132717 9925 132779 42622 6188 40819 24997 38107 36408 57363 46177 62026 61107 132869 56106 32911 24931 57669 48896 45481 132509 39839 63129 53370 25048 28747 46433 55051 18570 13955 16535 22903 9823 46945 263032 36536 50986 199733 48825 35914 33552 52236 28855 198347 40728 18120 53914 12586 55720 27640 62563 202118 9290 45887 51050 20068 49485 40374 14415 46873 14384 54555 263237 20773 53918 4851 32306 133229 28216 36236 42210 51248 49815 34649 41562 33260 24220 52347 45486 33182 53055 51290 132225 133120 42776 55799 48446 263093 56732 42399 47385 40539 42244 29302 10929 47549 200147 393326 198171 57773 47583 43472 32338 9166 62082 198651 24725 29067 197902 42418 29097 196645 56110 23535 29869 62756 26484 25926 15189 20401 24679 25128 39756 32400 9412 9667 51294 23052 28099 45693 17881 17669 17918 50926 201634 22611 54641 61102 132071 10207 45577 132070 262603 29883 24558 38279 199997 50465 14120 11235 50655 17019 31240 199481 16862 47161 56784 59791 59677 202023 199990 50872 54839 58936 11230 62310 38894 47172 262287 46260 14442 133143 197648 39451 58922 27589 42400 133393 201597 28997 60800 33322 38001 199129 197372 57752 201670 14244 22152 34541 196678 43198 47625 42331 62049 35295 42311 53589 59705 36791 14160 34432 41062 59135 201630 25260 23108 40281 31590 10532 22720 27357 33070 45187 7595 26481 29713 13926 54203 62651 63128 62838 30849 14987 47577 54334 63916 50915 21217 59816 23273 59632 29452 59795 60739 15919 49313 57879 56617 62088 45179 27597 201702 32740 58667 12617 199847 25642 14567 35278 197914 41079 1442 43620 197439 198313 42705 44398 13909 34745 24958 17971 47143 59854 57682 3722 13647 205544 4670 4766 133481 14361 23470 30823 12552 3352 37963 174 15830 203953 26496 51747 136897 25697 136787 23688 4134 45609 7713 36947 4837 23969 18403 24560 395662 14618 6939 42708 57523 201814 30860 212317 19318 62240 60068 45102 63949 15557 62240 212238 132203 135377 16276})