User Agent Blocking in 2024. Still valid?

Hello all,

I would like to ask if it is still valid in 2024 to create a custom User Agent Blocking rule in WAF, in order to block specific agents. I have seen similar topics here dated back in 2018-2022 where they mention to add the following user agents to be blocked:

(lower(http.user_agent) contains “appinsights”) or (lower(http.user_agent) contains “semrushbot”) or (lower(http.user_agent) contains “ahrefsbot”) or (lower(http.user_agent) contains “dotbot”) or (lower(http.user_agent) contains “whatcms”) or (lower(http.user_agent) contains “rogerbot”) or (lower(http.user_agent) contains “trendictionbot”) or (lower(http.user_agent) contains “blexbot”) or (lower(http.user_agent) contains “linkfluence”) or (lower(http.user_agent) contains “magpie-crawler”) or (lower(http.user_agent) contains “mj12bot”) or (lower(http.user_agent) contains “mediatoolkitbot”) or (lower(http.user_agent) contains “aspiegelbot”) or (lower(http.user_agent) contains “domainstatsbot”) or (lower(http.user_agent) contains “cincraw”) or (lower(http.user_agent) contains “nimbostratus”) or (lower(http.user_agent) contains “httrack”) or (lower(http.user_agent) contains “serpstatbot”) or (lower(http.user_agent) contains “omgili”) or (lower(http.user_agent) contains “grapeshotcrawler”) or (lower(http.user_agent) contains “megaindex”) or (lower(http.user_agent) contains “petalbot”) or (lower(http.user_agent) contains “semanticbot”) or (lower(http.user_agent) contains “cocolyzebot”) or (lower(http.user_agent) contains “domcopbot”) or (lower(http.user_agent) contains “traackr”) or (lower(http.user_agent) contains “bomborabot”) or (lower(http.user_agent) contains “linguee”) or (lower(http.user_agent) contains “webtechbot”) or (lower(http.user_agent) contains “domainstatsbot”) or (lower(http.user_agent) contains “clickagy”) or (lower(http.user_agent) contains “sqlmap”) or (lower(http.user_agent) contains “internet-structure-research-project-bot”) or (lower(http.user_agent) contains “seekport”) or (lower(http.user_agent) contains “awariosmartbot”) or (lower(http.user_agent) contains “onalyticabot”) or (lower(http.user_agent) contains “buck”) or (lower(http.user_agent) contains “riddler”) or (lower(http.user_agent) contains “sbl-bot”) or (lower(http.user_agent) contains “df bot 1.0”) or (lower(http.user_agent) contains “pubmatic crawler bot”) or (lower(http.user_agent) contains “bvbot”) or (lower(http.user_agent) contains “sogou”) or (lower(http.user_agent) contains “barkrowler”) or (lower(http.user_agent) contains “admantx”) or (lower(http.user_agent) contains “adbeat”) or (lower(http.user_agent) contains “embed.ly”) or (lower(http.user_agent) contains “semantic-visions”) or (lower(http.user_agent) contains “voluumdsp”) or (lower(http.user_agent) contains “wc-test-dev-bot”) or (lower(http.user_agent) contains “gulperbot”)

My question is: Is this still a go-to method in 2024 or Cloudflare blocks it by default? On the other hand, if this is still valid, from where we can get the updated list?

Probably still valid, but I would also add:

Empty user agent “”
Amazonbot
TheWebInternetSearcher
paloaltonetworks.com
“${${lower:j}ndi:${lower:l}${lower:d}a${lower:p}://${hostName}.useragent.”
2ip bot
8LEGS
Adobe Application Manager
ALittle Client
AndroidDownloadManager
anthropic-ai
Apache/2.4.34 (Ubuntu) OpenSSL/1.1.1 (internal dummy connection)
Apache-HttpClient
Avalon API Client
axios
B2B Bot
Baiduspider ( http://www.baidu.com/search/spider.htm)
“be aware this a vulnerable scanner from me if you see this maybe your host was vuln or hacked !!! if this UA help you buy me a coffe with btc to bc1qxfxmv06dwse3u5k5ugmg3ljh6246880x60ftlq”
Bloglines
BSbot
CheckMarkNetwork
colly - https://github.com/gocolly/colly/v2
Cpanel-HTTP-Client
cpp-httplib
CSSCheck
curl
DnBCrawler-Analytics
Download Demon
Drupal
ELinks
EmailWolf
Embarcadero URI Client
everyfeed-spider
ExperianCrawlUK (andrew dot swanton at phgroup dot com)
Facebot
fasthttp
FAST-WebCrawler
http://www.alltheweb.com/help/webmaster/crawler)
FeedFetcher-Google; ( ‫Google Feedfetcher | مجموعة خدمات بحث Google  |  المستندات  |  Google for Developers)
Fuzz Faster U Fool
Gaisbot
Go-http-client
golang-nic
grub-client
Gulper Web Bot
GuzzleHttp
hg-http-client
hGo-http-client
HTMLParser
httpx - Open-source project (GitHub - projectdiscovery/httpx: httpx is a fast and multi-purpose HTTP toolkit that allows running multiple probes using the retryablehttp library.)
IDBTE4M CODE87
IDG/EU
InetURL
IonCrawl
Java
Jetty
Jigsaw
libwww-perl
Malicious Scanner
Mon User-Agent personnalisé
Mozilliqa
Offline Explorer
okhttp
P3P Validator
page-preview-tool
Pandalytics
panscient.com
pantest
PHP
Python
quic-go-HTTP
RepoLookoutBot
Report Runner
Roku/DVP
Ruby
Scrapy
Screaming Frog SEO Spider
SearchExpress
Secragon Offensive Agent
SEMrushBot
SEOlizer
serpstatbot
Settings/1250.2.1 CFNetwork/1485 Darwin/23.1.0
Site24x7 Tools
Sogou web spider
Spider_Bot
StrategyBridge Crawler
test
TheSafeInternetSearch
TurnitinBot (TurnitinBot General Information Page)
Twitterbot
undici
url
Uzbl
VLC
W3C_Validator
Wappalyzer
WDG_Validator
Web Downloader
WebCopier
webprosbot
wp_is_mobile
Xenu Link Sleuth
xx032_bo9vs83_2a
ZaldamoSearchBot
ZoominfoBot
webtech
WebZIP
Wget

We find it easier to just block everyone we aren’t interested in and have a short list of countries and ASN’s that we accept but we are probably an edge case as we don’t want our content available to everyone

Certain ASN’s and countries will almost always be 100% script kiddies regardless of user agent so we think its better to block on that basis before blocking on user agent

This topic was automatically closed 15 days after the last reply. New replies are no longer allowed.