Robots can't access my website when I using Cloudflare

Hi,
When I using Cloudflare (Enable Cloudflare on Site) bots like Google, Bing, Pingdom, Sitechecker and etc. can’t access my website.
I didn’t enable bot fight mode option.
When I do pause on my website bots can access my website.
How I can solve this problem?
Thanks

How did you test your theory? Do you have any custom firewall rules?

1 Like

Hi,
As i said i pause cloudflare on my site and all bots can access mu site, and i enable cloud flare bots says can’t reach your site.
No i don’t have any rules on firewall

May I ask what is your domain name?

Is the I am under an attack! mode maybe enabled at Cloudflare dashboard?

Does the robots.txt file physically exist at your origin host / server?, or rather it’s generated as an “virtual one” like for example some WordPress plugins does that - Yoast SEO.

  • maybe the option to generate one is not being enabled, if using WordPress & Yoast SEO, or robots are blocked in the WordPress options “Prevent this website from indexing something” maybe checked - again, if using WordPress, otherwise we could troubleshoot further …

Can you paste in the content of your robots.txt here?

Are you using .htaccess file maybe?

Could there be some issue with HTTP(S) connection?

When you disable Cloudflare proxy, and before moving to Cloudflare, was your Website working over HTTPS connection?

May I ask what SSL option have you got selected under the SSL/TLS tab at Cloudflare dashboard for your domain ( Flexible, Full, Full Strict … )?

You could also try to, hm … Purge the cache (Caching → Configuration → Purge Everything just in case) for the “robots.txt” file from Cloudflare dashboard, if so.

Maybe, you need to allow Cloudflare network to connect to your origin host / server, therefore it would be “whitelisted” by Cloudflare IPs? (if for example Cloudflare gets blocked by some server firewall/security by sending many connections to it?):

Nevertheless, Cloudflare does not create/edit robots.txt file, it should be fixed at the origin host / server, at least from my perspective and understanding.

Hi,
First of all thank you for your reply.
My domain is https://iranimovies.tk/.
No, under attack mod is off in my dashboard.

This is my robots.txt url: https://www.iranimovies.tk/robots.txt

My website was working properly when cloudflare was turned off and before moving to cloudflare.

Yes i use .htaccess, but this isn’t the reason because my website working when cloudflare is off.

Yes when cloudflare is off my website work in https mode.(now cloudflare is off and you can test it.)

I use Full Strict mode.

At the end, now cloudflare is off you can test my website.

Thanks you for helping me to solve this problem.

1 Like

robots.txt file has got:
cf-edge-cache: cache,platform=wordpress

I see meta tag ok:
<meta name="robots" content="follow, index"/>
or
<meta name="robots" content="follow, index, max-snippet:-1, max-video-preview:-1, max-image-preview:large"/>

Sitemap ok:
https://www.iranimovies.tk/sitemap_index.xml

Using APO for WordPress?

x-powered-by: PHP/8.0.12

And WP Rocket as well?

I have to admit, I haven’t yet used WP Rocket.

Kindly, I would consider looking into below articles how to properly setup WP Rocket while using Cloudflare (if not already):

I see one issue here, remember it recently from some other customer/user:
Some strange request blocking/redirecting to something by WordPress (plugin I assume?):

https://www.iranimovies.tk/undefined/ -> 301 or 404
x-redirect-by: WordPress

In terms of if you choose to try out Cloudflare APO for WordPress while using WP Rocket combination, here is more information about it:

Can you now turn it on?

And Purge the APO WordPress cache and also Cache at Cloudflare Dashboard → Caching → Configuration → Purge Everything.

The thing I noticed, a plenty long time loading of your Website - like the issue with TTFB?

Or rather it’s due to the web hosting.

Meaning, if the request to the robots.txt or either sitemap_index.xml is too long, or sitemap_index.xml needs to generate itself each time, I am afraid this could end-up as a result of you having the issue where Google cannot “read” or “index” your virtual sitemap (generated by WordPress plugin) as far as it needs a processing power and some time to create it (maybe it’s not cached so that’s why).

  • Google is not the only crawler out there, there are tons of a bad crawlers and bots too which could cause the load to your web hosting / server …

Could you test with some other hosting provider, maybe?

Furthermore, in terms of Good bots while using Cloudflare, may I suggest creating a Firewall Rule which contains cf.client_bot and action “Allow” to make sure “Good bots” which are detected pass security checks and could be able to access your Website?:

Here is a list of currently detected bots: