We are having a strange issue with Yandex. We want them to crawl our site and we recently created a Yandex Webmaster account. And to ensure Yandex doesn’t get accidentally blocked by any of our other Firewall rules, we created a Firewall Rule and placed it in first position to allow the following User Agent: Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots) to access our site.
Now here is the strange part. We can see from our Firewall allow rule that they accessed our site approximately 1,700 times in the last 24 hours, which is great, but we also checked our site logs and it does not show that Yandex has visited our site even once. So there is a problem somewhere. We also checked our .htaccess file to make sure the Yandex bot isn’t getting blocked there and it is not as far as we can see.
So we are wondering why else might be preventing the Yandex bot from getting through to our site and registering in our site visitor logs. Any ideas perhaps where we can check please to see if we have the bot blocked accidentally?
Thank you so much.
For starters, Yandex is one of the search engines currently recognised by Cloudflare.
So there’s no need to explicitly whitelist them. On top of that, firewall rules would be the wrong place unless you have a previous rule blocking anything.
If the firewall log shows permitted requests and the requests were not cached they should have reached your server and should be in the logs. Are you sure we are not talking about cached resources?
Thank you for your reply. I am very sorry, but I am not sure I understand your question about cached resources. What I can tell you though is that the site that we want Yandex to index is a phpbb forum and in the admin area it automatically logs all bots based upon their user agent and it shows that the Yandex bot has never visited the site.
We can remove the firewall rule to allow it access, but the only reason we created that is because we do have Russia as a country that is served a JS Challenge. So we didn’t want them getting slowed down by that.
If resources are cached on the Cloudflare proxies they will be served by them and those requests will never reach your server.
Can you post a sample screenshot from the firewall log which did not reach your server?
Sorry if my last reply was still a bit confusing perhaps. The firewall log from Cloudflare shows that the requests from Yandex were all allowed to reach our site so no problem there. And on our site it logs the last time a bot spyder or bot crawler visited our site and it shows us that Yandex has never visited our site. I also monitor who is online in real time quite often on our site and I also never see Yandex actually on the site, but I see all the other main bots on the site quite often like Bing, Google, Facebook, etc.
This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.