Gidday good users of Cloudflare and Cloudflare Advisors
I run a Directory
My Users are submitting Cloudflare enabled sites
My Directory checks their submitted URL with like a curl “Is the status 200”
it simply helps me eliminate Broken Links in my directory and make sure my users are submitting good and current sites I hope you can understand the importance of this
Cloudflare sites naturally fail with a 403 and ■■■■ comes back like"Checking if the site is Secure"
I dont want to scrape anything I dont need to attack anything I dont need to get into all that garbage I just need to know in a whitehat way my user is giving me a good unbroken link otherwise I want to delete it
I dont want to do all this Impersonating Browsers and black magic ■■■■ to make it work
Curl seems completley broken for Cloudflare (unless you want to get a little evil then seems there is no obstacle, so this is rubbish I dont want to do that)
What is my solution?
Anybody got something in simple PHP that will tell me yes this site is Cloudflare and it is OK?
thanks. Garry. Australia.
I think this is bot fight mode, I am unsure. I think by default cURL is not blocked.
However, most block user-agents, cURL, wget, python the list goes on manually. Sadly they are abused, and yes you can make fake user-agents, but you will not get far unless you got some good skills or access to closed sources to spoof it if you are scraping enterprise protected CF sites.
Personally, i challenge access to all user-agents except current. For example, Firefox 102, 105, 106, 107, 109 from lts to alpha, and rotate the user-agents when ones expire and new alphas arrive, many fake user-agents never update them, heaps of users are doing this, so this can also affect you in the long term using fake user-agents, but its not as widely used right now.
Basically, thanks to idiots on the net, this is going to get harder for you, even though you may be doing everything above board and non-maliciously.