I had a lot of bot traffic on a site and searched for ways to block the bad/unnecessary ones, Most tips is to do it in robots.txt but that is not forcing anything. Hope it will be of use for more people, I got a lot less ■■■■ traffic.
I found a good thing on some site i cant find my way back to, but I got a firewall rule to block bots.
I added a bit to it as well let rytebot through and some more, what do you think?
(http.user_agent contains “Yandex”) or (http.user_agent contains “muckrack”) or (http.user_agent contains “Qwantify”) or (http.user_agent contains “Sogou”) or (http.user_agent contains “BUbiNG”) or (http.user_agent contains “knowledge”) or (http.user_agent contains “CFNetwork”) or (http.user_agent contains “Scrapy”) or (http.user_agent contains “SemrushBot”) or (http.user_agent contains “AhrefsBot”) or (http.user_agent contains “Baiduspider”) or (http.user_agent contains “python-requests”) or (http.user_agent contains “crawl” and not cf.client.bot) or (http.user_agent contains “Crawl” and not cf.client.bot) or (http.user_agent contains “bot” and not http.user_agent contains “bingbot” and not http.user_agent contains “Google” and not http.user_agent contains “Twitter” and not cf.client.bot) or (http.user_agent contains “Bot” and not http.user_agent contains “Google” and not cf.client.bot) or (http.user_agent contains “Spider” and not cf.client.bot) or (http.user_agent contains “spider” and not cf.client.bot)
(http.user_agent contains "Yandex") or (http.user_agent contains "muckrack") or (http.user_agent contains "Qwantify") or (http.user_agent contains "Sogou") or (http.user_agent contains "BUbiNG") or (http.user_agent contains "knowledge") or (http.user_agent contains "CFNetwork") or (http.user_agent contains "Scrapy") or (http.user_agent contains "SemrushBot") or (http.user_agent contains "AhrefsBot") or (http.user_agent contains "Baiduspider") or (http.user_agent contains "python-requests") or (http.user_agent contains "crawl" and not cf.client.bot) or (http.user_agent contains "Crawl" and not cf.client.bot) or (http.user_agent contains "bot" and not http.user_agent contains "bingbot" and not http.user_agent contains "Google" and not http.user_agent contains "Twitter" and not cf.client.bot) or (http.user_agent contains "Bot" and not http.user_agent contains "Google" and not cf.client.bot) or (http.user_agent contains "Spider" and not cf.client.bot) or (http.user_agent contains "spider" and not cf.client.bot)
Thanks completely missed that one. However did not work for me, just got error when trying to add the list.
Filter parsing error (1:34): (lower(http.user_agent) contains “appinsights”) or (lower(http.user_agent) contains “semrushbot”) or (lower(http.user_agent) contains “ahrefsbot”) or (lower(http.user_agent) contains “dotbot”) or (lower(http.user_agent) contains “whatcms”) or (lower(http.user_agent) contains “rogerbot”) or (lower(http.user_agent) contains “trendictionbot”) or (lower(http.user_agent) contains “blexbot”) or (lower(http.user_agent) contains “linkfluence”) or (lower(http.user_agent) contains “magpie-crawler”) or (lower(http.user_agent) contains “mj12bot”) or (lower(http.user_agent) contains “mediatoolkitbot”) or (lower(http.user_agent) contains “aspiegelbot”) or (lower(http.user_agent) contains “domainstatsbot”) or (lower(http.user_agent) contains “cincraw”) or (lower(http.user_agent) contains “nimbostratus”) or (lower(http.user_agent) contains “httrack”) or (lower(http.user_agent) contains “serpstatbot”) or (lower(http.user_agent) contains “omgili”) or (lower(http.user_agent) contains “grapeshotcrawler”) or (lower(http.user_agent) contains “megaindex”) or (lower(http.user_agent) contains “petalbot”) or (lower(http.user_agent) contains “semanticbot”) or (lower(http.user_agent) contains “cocolyzebot”) or (lower(http.user_agent) contains “domcopbot”) or (lower(http.user_agent) contains “traackr”) or (lower(http.user_agent) contains “bomborabot”) or (lower(http.user_agent) contains “linguee”) or (lower(http.user_agent) contains “webtechbot”) or (lower(http.user_agent) contains “domainstatsbot”) or (lower(http.user_agent) contains “clickagy”) or (lower(http.user_agent) contains “sqlmap”) or (lower(http.user_agent) contains “internet-structure-research-project-bot”) or (lower(http.user_agent) contains “seekport”) or (lower(http.user_agent) contains “awariosmartbot”) or (lower(http.user_agent) contains “onalyticabot”) or (lower(http.user_agent) contains “buck”) or (lower(http.user_agent) contains “riddler”) or (lower(http.user_agent) contains “sbl-bot”) or (lower(http.user_agent) contains “df bot 1.0”) or (lower(http.user_agent) contains “pubmatic crawler bot”) or (lower(http.user_agent) contains “bvbot”) or (lower(http.user_agent) contains “sogou”) or (lower(http.user_agent) contains “barkrowler”) or (lower(http.user_agent) contains “admantx”) or (lower(http.user_agent) contains “adbeat”) or (lower(http.user_agent) contains “embed.ly”) or (lower(http.user_agent) contains “semantic-visions”) or (lower(http.user_agent) contains “voluumdsp”) or (lower(http.user_agent) contains “wc-test-dev-bot”) or (lower(http.user_agent) contains “gulperbot”) ^^^^ invalid digit found in string while parsing with radix 16