Firewall using URI to all bots

Hi, I want to allow bots to access some particular pages/posts.
I am using following firewall rules:

Is it alright if I use this? Or should I make any changes?

May I ask what kind of bots?
How would you recognize or separate them, based on the requests?

Usually, the have some user-agent name which contains something like Googlebot, Bingbot, Yandexbot, etc.

Are there any particular, the ones you need?

Nevertheless, there are “good” and “bad” bots, a good article for more information can be read on the below link:

Regarding your screenshot from Firewall Rules, there is also the option, that include bots listed here:

Furthermore, you can even turn on the Bot Fight Mode option too:

Or, you want to allow access to all bots (is this a good approach?).

I was thinking of using allowing all to access feeds and robot.txt
I use a lot of firewall so don’t want to have some useful bot to get in crossfire.

Hm, that makes me question myself, aren’t they already accessing the needed URLs as they aren’t being blocked by some security option being enabled at Cloudflare dashboard such as “Bot Fight Mode”?

I mean, for example, when we block them using Firewall Rules, we can actually count and see the percentage of the traffic to our Website(s) made by bots (either SEO bots, spider bots, really bad bots, crawlers, etc.).

Even if we allow or block them in robots.txt file, there could still be a number of them not actually looking and respecting what is stated into the robots.txt file, meaning the python crawler would just go over all the links available to him, regarding what is stated in the robots.txt file (for sitemap.xml or some other Disallow rule - which it bypasses it easily).

An example to add as a Firewall Rule, just in case to verify we allow good bots and some others to the needed URLs would be like:

( or
(http.user_agent contains "duckduckgo") or
(http.user_agent contains "facebookexternalhit") or
(http.user_agent contains "Feedfetcher-Google") or
(http.user_agent contains "LinkedInBot") or
(http.user_agent contains "Mediapartners-Google") or
(http.user_agent contains "msnbot") or
(http.user_agent contains "Slackbot") or
(http.user_agent contains "TwitterBot") or
(http.user_agent contains "ia_archive") or
(http.user_agent contains "yahoo")

And then the URL part for which we actually allow them, in case they would somehow get blocked.

There could also be a way to use cf.bot_management.verified_bot just to make sure.

I think you have misunderstood. As I said I have many firewall rules (basically js challenges), I have got into situation I have whitelist IP’s of services which I use. It’s hassle and sometimes doesn’t work properly so I want to give access to those pages but I guese giving access to feed will be sufficient.

Thanks for your help.

This topic was automatically closed 15 days after the last reply. New replies are no longer allowed.