How to block indexing by .pw spam sites

Hi first of all sorry for my bad English.
This is a big problem that some .pw sites often index our sites and add them to their lists.
I want to block this. I have a new site so i work on Cloudflare Firewall Rules!

I want to block these bots but i don’t know which ones i will block and also i don’t have much info about rules. Here is some sites about these rules.

https://blog.runcloud.io/cloudflare-firewall-rules/#example-3-block-bad-bot-traffic

(http.user_agent contains "Yandex") or (http.user_agent contains "muckrack") or (http.user_agent contains "Qwantify") or (http.user_agent contains "Sogou") or (http.user_agent contains "BUbiNG") or (http.user_agent contains "knowledge") or (http.user_agent contains "CFNetwork") or (http.user_agent contains "Scrapy") or (http.user_agent contains "SemrushBot") or (http.user_agent contains "AhrefsBot") or (http.user_agent contains "Baiduspider") or (http.user_agent contains "python-requests") or ((http.user_agent contains "crawl") or (http.user_agent contains "Crawl") or (http.user_agent contains "bot" and not http.user_agent contains "bingbot" and not http.user_agent contains "Google" and not http.user_agent contains "Twitter")or (http.user_agent contains "Bot" and not http.user_agent contains "Google") or (http.user_agent contains "Spider") or (http.user_agent contains "spider") and not cf.client.bot)
(http.user_agent contains "?%00") or
(http.user_agent contains "/bin/") or
(lower(http.user_agent) contains "curl") or
(http.user_agent contains "echo ") or
(http.user_agent contains "eval(") or
(http.user_agent contains "wget ") or
(http.user_agent contains "AhrefsBot") or
(http.user_agent contains "ALittle") or
(http.user_agent contains "baidu") or
(http.user_agent contains "coccocbot") or
(http.user_agent contains "DavClnt") or
(http.user_agent contains "DnyzBot") or
(http.user_agent contains "DotBot") or
(http.user_agent contains "GRequest") or
(http.user_agent contains "Hello") or
(http.user_agent contains "http-client") or
(http.user_agent contains "nowledge") or
(http.user_agent contains "Lua") or
(http.user_agent contains "mail.ru") or
(http.user_agent contains "My User Agent") or
(http.user_agent contains "NetSystemsResearch") or
(http.user_agent contains "Nikto") or
(http.user_agent contains "Nimbostratus") or
(http.user_agent contains "PetalBot") or
(lower(http.user_agent) contains "python") or
(http.user_agent contains "ReactorNetty") or
(http.user_agent contains "RestSharp") or
(http.user_agent contains "Scrapy") or
(http.user_agent contains "SeznamBot") or
(http.user_agent contains "Sogou") or
(http.user_agent contains "spbot") or
(http.user_agent contains "Uptimebot") or
(http.user_agent contains "WebDAV-MiniRedir") or
(http.user_agent contains "WinHttp.WinHttpRequest") or
(http.user_agent contains "YandexBot") or
(http.user_agent contains "ZmEu")

And

(lower(http.user_agent) contains “appinsights”) or (lower(http.user_agent) contains “semrushbot”) or (lower(http.user_agent) contains “ahrefsbot”) or (lower(http.user_agent) contains “dotbot”) or (lower(http.user_agent) contains “whatcms”) or (lower(http.user_agent) contains “rogerbot”) or (lower(http.user_agent) contains “trendictionbot”) or (lower(http.user_agent) contains “blexbot”) or (lower(http.user_agent) contains “linkfluence”) or (lower(http.user_agent) contains “magpie-crawler”) or (lower(http.user_agent) contains “mj12bot”) or (lower(http.user_agent) contains “mediatoolkitbot”) or (lower(http.user_agent) contains “aspiegelbot”) or (lower(http.user_agent) contains “domainstatsbot”) or (lower(http.user_agent) contains “cincraw”) or (lower(http.user_agent) contains “nimbostratus”) or (lower(http.user_agent) contains “httrack”) or (lower(http.user_agent) contains “serpstatbot”) or (lower(http.user_agent) contains “omgili”) or (lower(http.user_agent) contains “grapeshotcrawler”) or (lower(http.user_agent) contains “megaindex”) or (lower(http.user_agent) contains “petalbot”) or (lower(http.user_agent) contains “semanticbot”) or (lower(http.user_agent) contains “cocolyzebot”) or (lower(http.user_agent) contains “domcopbot”) or (lower(http.user_agent) contains “traackr”) or (lower(http.user_agent) contains “bomborabot”) or (lower(http.user_agent) contains “linguee”) or (lower(http.user_agent) contains “webtechbot”) or (lower(http.user_agent) contains “domainstatsbot”) or (lower(http.user_agent) contains “clickagy”) or (lower(http.user_agent) contains “sqlmap”) or (lower(http.user_agent) contains “internet-structure-research-project-bot”) or (lower(http.user_agent) contains “seekport”) or (lower(http.user_agent) contains “awariosmartbot”) or (lower(http.user_agent) contains “onalyticabot”) or (lower(http.user_agent) contains “buck”) or (lower(http.user_agent) contains “riddler”) or (lower(http.user_agent) contains “sbl-bot”) or (lower(http.user_agent) contains “df bot 1.0”) or (lower(http.user_agent) contains “pubmatic crawler bot”) or (lower(http.user_agent) contains “bvbot”) or (lower(http.user_agent) contains “sogou”) or (lower(http.user_agent) contains “barkrowler”) or (lower(http.user_agent) contains “admantx”) or (lower(http.user_agent) contains “adbeat”) or (lower(http.user_agent) contains “embed.ly”) or (lower(http.user_agent) contains “semantic-visions”) or (lower(http.user_agent) contains “voluumdsp”) or (lower(http.user_agent) contains “wc-test-dev-bot”) or (lower(http.user_agent) contains “gulperbot”)

@freitasm; @jnperamo; @michael; @sdayman; @fritex;

Thanks

Greetings,

Thank you for asking.

Please, in further cases there is no need to mention everyone.

Try to use:
(http.referer contains ".pw")

Then action “block”.

Try to determine their IP address or block few AS numbers, and re-check.

Disable or block access to the /rss or /feed to anyone:

Either, block access to the sitemap.xml and robots.txt file to anyone except Google AS number using a Page Rule like below:

(http.request.uri.path contains "sitemap_index.xml" and ip.geoip.asnum ne 15169) or (http.request.uri.path contains "sitemap.xml" and ip.geoip.asnum ne 15169) or (http.request.uri.path contains "robots.txt" and ip.geoip.asnum ne 15169 and not http.user_agent contains "Googlebot") or (http.request.uri.path contains "robots.txt" and ip.geoip.asnum ne 15169 and not http.user_agent contains "APIs-Google") or (http.request.uri.path contains "robots.txt" and ip.geoip.asnum ne 15169 and not http.user_agent contains "Mediapartners-Google") or (http.request.uri.path contains "robots.txt" and ip.geoip.asnum ne 15169 and not http.user_agent contains "AdsBot-Google-Mobile") or (http.request.uri.path contains "robots.txt" and ip.geoip.asnum ne 15169 and not http.user_agent contains "AdsBot-Google") or (http.request.uri.path contains "robots.txt" and ip.geoip.asnum ne 15169 and not http.user_agent contains "Googlebot-Image") or (http.request.uri.path contains "robots.txt" and ip.geoip.asnum ne 15169 and not http.user_agent contains "Googlebot-News") or (http.request.uri.path contains "robots.txt" and ip.geoip.asnum ne 15169 and not http.user_agent contains "Googlebot-Video") or (http.request.uri.path contains "robots.txt" and ip.geoip.asnum ne 15169 and not http.user_agent contains "AdsBot-Google-Mobile-Apps") or (http.request.uri.path contains "robots.txt" and ip.geoip.asnum ne 15169 and not http.user_agent contains "FeedFetcher-Google") or (http.request.uri.path contains "robots.txt" and ip.geoip.asnum ne 15169 and not http.user_agent contains "Google-Read-Aloud") or (http.request.uri.path contains "robots.txt" and ip.geoip.asnum ne 15169 and not http.user_agent contains "DuplexWeb-Google") or (http.request.uri.path contains "robots.txt" and ip.geoip.asnum ne 15169 and not http.user_agent contains "Google Favicon") or (http.request.uri.path contains "robots.txt" and ip.geoip.asnum ne 15169 and not http.user_agent contains "Storebot-Google ") or (http.request.uri.path contains "sitemap" and ip.geoip.asnum ne 15169)

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.