Blocking bots think i found a good way

I had a lot of bot traffic on a site and searched for ways to block the bad/unnecessary ones, Most tips is to do it in robots.txt but that is not forcing anything. Hope it will be of use for more people, I got a lot less ■■■■ traffic.

I found a good thing on some site i cant find my way back to, but I got a firewall rule to block bots.
I added a bit to it as well let rytebot through and some more, what do you think?

(http.user_agent contains “Yandex”) or (http.user_agent contains “muckrack”) or (http.user_agent contains “Qwantify”) or (http.user_agent contains “Sogou”) or (http.user_agent contains “BUbiNG”) or (http.user_agent contains “knowledge”) or (http.user_agent contains “CFNetwork”) or (http.user_agent contains “Scrapy”) or (http.user_agent contains “SemrushBot”) or (http.user_agent contains “AhrefsBot”) or (http.user_agent contains “Baiduspider”) or (http.user_agent contains “python-requests”) or (http.user_agent contains “crawl” and not cf.client.bot) or (http.user_agent contains “Crawl” and not cf.client.bot) or (http.user_agent contains “bot” and not http.user_agent contains “bingbot” and not http.user_agent contains “Google” and not http.user_agent contains “Twitter” and not cf.client.bot) or (http.user_agent contains “Bot” and not http.user_agent contains “Google” and not cf.client.bot) or (http.user_agent contains “Spider” and not cf.client.bot) or (http.user_agent contains “spider” and not cf.client.bot)


(http.user_agent contains "Yandex") or (http.user_agent contains "muckrack") or (http.user_agent contains "Qwantify") or (http.user_agent contains "Sogou") or (http.user_agent contains "BUbiNG") or (http.user_agent contains "knowledge") or (http.user_agent contains "CFNetwork") or (http.user_agent contains "Scrapy") or (http.user_agent contains "SemrushBot") or (http.user_agent contains "AhrefsBot") or (http.user_agent contains "Baiduspider") or (http.user_agent contains "python-requests") or (http.user_agent contains "crawl" and not cf.client.bot) or (http.user_agent contains "Crawl" and not cf.client.bot) or (http.user_agent contains "bot" and not http.user_agent contains "bingbot" and not http.user_agent contains "Google" and not http.user_agent contains "Twitter" and not cf.client.bot) or (http.user_agent contains "Bot" and not http.user_agent contains "Google" and not cf.client.bot) or (http.user_agent contains "Spider" and not cf.client.bot) or (http.user_agent contains "spider" and not cf.client.bot)

That’s similar to this post:

2 Likes

Thanks completely missed that one. However did not work for me, just got error when trying to add the list.

Filter parsing error (1:34): (lower(http.user_agent) contains “appinsights”) or (lower(http.user_agent) contains “semrushbot”) or (lower(http.user_agent) contains “ahrefsbot”) or (lower(http.user_agent) contains “dotbot”) or (lower(http.user_agent) contains “whatcms”) or (lower(http.user_agent) contains “rogerbot”) or (lower(http.user_agent) contains “trendictionbot”) or (lower(http.user_agent) contains “blexbot”) or (lower(http.user_agent) contains “linkfluence”) or (lower(http.user_agent) contains “magpie-crawler”) or (lower(http.user_agent) contains “mj12bot”) or (lower(http.user_agent) contains “mediatoolkitbot”) or (lower(http.user_agent) contains “aspiegelbot”) or (lower(http.user_agent) contains “domainstatsbot”) or (lower(http.user_agent) contains “cincraw”) or (lower(http.user_agent) contains “nimbostratus”) or (lower(http.user_agent) contains “httrack”) or (lower(http.user_agent) contains “serpstatbot”) or (lower(http.user_agent) contains “omgili”) or (lower(http.user_agent) contains “grapeshotcrawler”) or (lower(http.user_agent) contains “megaindex”) or (lower(http.user_agent) contains “petalbot”) or (lower(http.user_agent) contains “semanticbot”) or (lower(http.user_agent) contains “cocolyzebot”) or (lower(http.user_agent) contains “domcopbot”) or (lower(http.user_agent) contains “traackr”) or (lower(http.user_agent) contains “bomborabot”) or (lower(http.user_agent) contains “linguee”) or (lower(http.user_agent) contains “webtechbot”) or (lower(http.user_agent) contains “domainstatsbot”) or (lower(http.user_agent) contains “clickagy”) or (lower(http.user_agent) contains “sqlmap”) or (lower(http.user_agent) contains “internet-structure-research-project-bot”) or (lower(http.user_agent) contains “seekport”) or (lower(http.user_agent) contains “awariosmartbot”) or (lower(http.user_agent) contains “onalyticabot”) or (lower(http.user_agent) contains “buck”) or (lower(http.user_agent) contains “riddler”) or (lower(http.user_agent) contains “sbl-bot”) or (lower(http.user_agent) contains “df bot 1.0”) or (lower(http.user_agent) contains “pubmatic crawler bot”) or (lower(http.user_agent) contains “bvbot”) or (lower(http.user_agent) contains “sogou”) or (lower(http.user_agent) contains “barkrowler”) or (lower(http.user_agent) contains “admantx”) or (lower(http.user_agent) contains “adbeat”) or (lower(http.user_agent) contains “embed.ly”) or (lower(http.user_agent) contains “semantic-visions”) or (lower(http.user_agent) contains “voluumdsp”) or (lower(http.user_agent) contains “wc-test-dev-bot”) or (lower(http.user_agent) contains “gulperbot”) ^^^^ invalid digit found in string while parsing with radix 16

I’m not sure where that “radix 16” error would come from, but it wouldn’t surprise if it had an issue with the smart quotes.

A good thing about the one i posted over the other approach is it uses cf.client.bot , its a bit more dynamic , no need to list every bot out there.

This topic was automatically closed after 30 days. New replies are no longer allowed.