Firewall Operator

ExpertTip

#21

Bear in mind that some bots take a few days to start following new robots.txt directives.

It’s also a good idea to purge the cache for robots.txt.


#22

I am afraid I would generally not rely too much on robots.txt. It can be a good indicator to compliant crawlers, but the non-compliant ones are definitely the majority and keeping them out often is the main objective.

Also, even seemingly compliant crawlers often forget about what they were told. Baidu is an extreme example but even Google ignores it occasionally.


#23

I totally agree.

It would be great if Cloudflare let us edit the list of known bots. The current list lacks Facebook, Twitter, GTMetrix to name a few good ones (if you use them). And has many that I have no interest in letting in, such as some search engines from Russia and China, as well as some SEO crawlers that add nothing to a site if your not actively using their services.


#24

Finally managed to post it :slight_smile: Enhancement for crawler/bot matching


#25

Yes. But I allowed GTMERIX IPs in Firewall rules and not on tools, did not work. I will try it on tools now.
Thanks.

Instead of Blocking all countries, can "JS Challenge"or “Challenge (Captcha)” stop these bots?

Thanks


#26

JS is fairly efficient nowadays, so don’t be surprised if some bots now integrate JS to get past these challenges. I’d recommend the captcha if you think your website is high enough value to warrant a captcha instead of the JS challenge.


#27

Hello Floripare,
Could you be so kind to screenshot your IP rule that got Gtmtrix working for you.

Many thanks


#28

Here’s what’s working for me. Notice the two GTMetrix IPs in São Paulo wouldn’t be affected, so as an example I added Hong Kong. Without the whitelist, GTMetrix is blocked when I select its HK server.

In Firewall > Firewall Rules

In Firewall > Tools > IP Access Rule:


#29

robots.txt


#30

I tried you recommendation and it worked for Gtmtrix. I hope this will help others in such case as mine.


#31

Hello all,
What is CloudFlare Firewall syntax to allow hostnames like:

  1. smsh-703194-juc1ugur1qwqqqo4.stackpathdns.com
  2. 703194.smushcdn.com

Thanks


#32

If they don’t change something like this but allowing third parties to bypass the firewall is generally a bad idea.

(http.host eq "smsh-703194-juc1ugur1qwqqqo4.stackpathdns.com") or (http.host eq "703194.smushcdn.com")