Hello -
I have a few sites hosted on SiteGround that are also ported through Cloudflare. It was recently discovered that if I use the following tool to crawl one of my websites using a Googlebot User Agent, Cloudflare returns a 520 Origin Server Error.
Now, SiteGround has eluded to the following:
We have an antispoof AI that checks the commonly used user agents and if they resolve to the legitimate IP addresses related to them. In this case, when you select the Googlebot user agent, our AI blocks the request since it comes from an IP that does not belong to that user agent.
Removing these checks and blockages will create a security hole and this user agent can be used with malicious intent, thus please run the tests with browser user agents.
However, if I run the site Medium (which also has Cloudflare), CF returns a 403 Forbidden and a message saying that I am being blocked due to being a “faked bot”:
https://aw-snap.info/file-viewer/?protocol=secure&tgt=medium.com&ref_sel=GSP2&ua_sel=gbot2&fs=1
Why wouldn’t Cloudflare produce the same “blocked” message when I run the test on my site using the same parameters (Google / Googlebot)?
I thought the whole point of having Cloudflare was to “catch” stuff before it even hits the Origin Server? Wherein my website’s case, it is getting past Cloudflare and my host is catching it.
Any insight would be appreciated.