"Manage bot traffic with robots.txt" feature bug

What is the name of the domain?

What is the issue you’re encountering

“Manage bot traffic with robots.txt” does not block bots on inner pages. The feature creates own robots.txt only for the homepage, NOT for inner pages.

Was the site working with SSL prior to adding it to Cloudflare?

Yes

What is the current SSL/TLS setting?

Flexible

Any plans to fix it?

Could you clarify what you are referring to by inner pages? An example of it would be helpful.

1 Like

Inner pages goes after domain.com/ - domain.com/some-page/

Example (warning - adult content site): https://www.anabel054.net/robots.txt - it works.
https://www.anabel054.net/about/robots.txt - it does not works.

The robots.txt file lives at the root of the (sub)domain. You define the rules for the sections of the website in that file.

For root is everything fine. But when bots visits inner pages and does not find any robot.txt - what rules they will respect?
I assume they will not look at domain’s root, if they visits inner pages directly - from search.

Bots either follow robots.txt or they don’t. If they follow it, they check for the existence of the file and follow it’s directives when they visit a website. Cloudflare’s own robots.txt as an example: https://www.cloudflare.com/robots.txt

https://www.cloudflare.com/learning/bots/what-is-robots-txt/

What happens if bots visit as example this URL - Cloudflare Registrar | Register & Renew Domain Names and checks for robots.txt ?
https://www.cloudflare.com/products/registrar/robots.txt

Then the bot is broken and doesn’t respect robots.txt.

So the problem (bug) I reported - the new feature “Manage bot traffic with robots.txt" does not block bots on inner pages as it does not create a robots.txt on it at all.

What are the contents of this file?

Cloudflare’s created robots.txt for root?

That blocks them for the entire site as written. There is no bug. There is only robots.txt for a site and it is located at the root.

1 Like

I have found a real example, with robots.txt rules for all pages, not only homepage:
https://chaturbate.com/robots.txt
https://chaturbate.com/princess_kristy/robots.txt

No you haven’t. You have found an example where robots.txt is returned unnecessarily beyond the site root.

I am sorry you misunderstood how robots.txt works. I have tried to explain it and verified your robots.txt conforms to the standard.

If you want a robots.txt for every page/file/directory you can certainly create them yourself. But the fact Cloudflare isn’t doing that for you is not a bug.

1 Like

Rather than downvoting me because I’ve indicated it isn’t a bug and the file Cloudflare produced for you is formatted correctly and protects the entire site, let’s try this:

Here is the RFC that defines how robots.txt works.

Section 2.3 is quite clear as to where crawlers are to look for and obtain the file.

2 Likes