Using a WAF to block robots.txt from humans

Hi, gl

Cloudflare is doing a perfect job at blocking bots from crawling and scanning my robots.txt fies.These includes crawls by bots with empty user agent strings ect ect. Thanks Cloudflare.

At this point I would like to as the community, athough no directly to the functionality of the Cloudflare WAF. Would it be an issue if I was to create a WAF rule to only allow Google, Bing, and my own IP address from being able to read my robots.txt file?

This question of my popped up after I was reviewing my Cloudlfare logs and saw quite a few attempts by bots with empty user agents hitting my robots.txt. This of course means real humans would be allowed to view it.

So, from a overall security standpoint. Would it be recommend to block my robots.txt from “All” and only allow Google. bing. and my own IP? Is there any possible issues that may arise from me doing this?

Of course, The other way would be to only allow “Verified” bots on the robots.txt, However, to go one step further I would like to block everything inlcuding all other humans from viewing the file. (Specifically only allowing My IP, Google and Bing.

Any forseen issues with me doing this?

Thanks in advance.

I use this approach, works fine, no issue.

However, remember that bots might sniff aroud and find your sitemap.xml file, therefrom grab and crawl things from it.

Might have to consider allowing only Google, Bing and your own IP address to the .xml files too? Could be.

Can’t say in general, but I have this practice on a news site portals since I’ve catched fake Googlebots even from my own country or other countries, trying to grab and steel the content.

No, just make sure to write the Firewall Rule correctly to not block real Googlebot from accessing your robots.txt and sitemap related files, which would result in some issues with indexing, crawling and SEO.

1 Like

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.