My server is blocked accessing Cloudflare sites: how do I unblock that?

I have an application on an Amazon EC2 server, which reads RSS feeds from a variety of websites, using a variety of PHP scripts.

Attempting to access RSS feeds, or other portions, of Cloudflare-protected websites reliably returns a header of

HTTP/1.1 403 Forbidden

…regardless of whether I set a user-agent or not.

How can I get my IP address reviewed, so that calls from my application server are not blocked by Cloudflare?

Website owners can literally do what they want to do with their websites, that’s beyond the control of Cloudflare, even when the website owner choose to use Cloudflare.

Contact the website owner, and at the bare minimum, include the Ray ID you see on the error page, and they will be able to dig deeper in to the situation with you.

Thanks. It’s multiple Cloudflare websites, so I doubt that individual websites are specifically blocking my application server. It appears to be that my application server is identified as a threat by Cloudflare’s systems.

There is no BODY returned in the request, so no “error page”; just a flat 403 Forbidden response.

Hopefully helpful headers are:

    [cf-mitigated] => challenge
    [Report-To] => {"endpoints":[{"url":"https:\/\/a.nel.cloudflare.com\/report\/v3?s=(snip)"}],"group":"cf-nel","max_age":604800}
    [NEL] => {"success_fraction":0,"report_to":"cf-nel","max_age":604800}
    [Server] => cloudflare
    [CF-RAY] => 8485b5773(snip)
1 Like

It’s likely not specifically you being blocked, and it’s actually a challenge, not a block.

Many people, as I do, will challenge all requests from big ASNs (AWS, Azure, Google Cloud, Digital Ocean, OVH, etc) as these are home to lots of bots scanning for Wordpress installs and other common paths and vulnerabilities. It stops a lot of junk or threat requests reaching the origin.

3 Likes

Thank you. Two questions, then:

  1. How do I respond to that challenge, when it’s a PHP script connecting to the website?

  2. Is there a page explaining to website owners how to allow specific useragents and/or relax these for specific URLs?

As one example, an RSS feed is designed to be pulled down by a server, but no RSS readers is going to know how to respond to a challenge.

You can’t(*), it’s designed to sort automated requests from human requests, so it’s doing exactly that for the users that want that protection.

(*) well, you can if you can receive the Javascript, execute it and return the correct response. But then you’d make more money selling that service to the “bad guys” that want to work round protections Cloudflare users rely on. Such services are available.

You could just ask the site owners if they would kindly allowlist your IP address. That would be up to them to decide and implement.

It’s easy to do, you just exclude the specific IP address, URL or other parameter from the rule that challenges the ASN. For example, we have a rule with a list of over 100 ASNs we challenge, then add “AND ip not eq {our Ionos IP}”.

The RSS reader of a real person will request from the ASN of a consumer or corporate ISP, not that of a big Cloudy provider, so won’t be in the list to challenge.

[add] Try running your PHP from a server at home. If the challenge is due to the ASN, it should work from there (depending if there are any further challenges based on user agent, country or other parameters).

1 Like

If you have your own Cloudflare site you can try and confirm that by triggering it in your own configuration and checking the Security > Events view to determine the cause. It could be something like Security Level or commonly enabled features like Bot Fight Mode, Browser Integrity Check, etc. This would effectively be a guess, but may turn up some useful data.

You can also get a Ray ID from one of these responses (bottom of the html or cf-ray header) and ask if one of the site owners would be so kind as to check their own Security Events and determine the cause for you.

But if you can’t do that or it doesn’t work, you’re out of luck and will need to follow the other advice in this thread and work with site owners to come to an agreement.

1 Like

Thanks, all.

The RSS reader of a real person will request from the ASN of a consumer or corporate ISP, not that of a big Cloudy provider, so won’t be in the list to challenge.

Not true in the case of many larger cloud-based RSS readers. The IP address will absolutely be in a cloud provider. Similarly for many podcast platforms, who scrape RSS feeds (with permission!!) very regularly.

You could just ask the site owners if they would kindly allowlist your IP address.

That’s not scalable, especially for someone checking RSS feeds from more than four million podcasts! (I check RSS feeds once every month, on-demand, but Cloudflare still appears to block the request.)

It seems that for someone wanting to have an API on their website, or an RSS feed, then Cloudflare is the wrong choice.

I asked: > Is there a page explaining to website owners how to allow specific useragents and/or relax these for specific URLs?

The response:

It’s easy to do, you just exclude the specific IP address, URL or other parameter from the rule that challenges the ASN.

I don’t use Cloudflare to host my website. Is there a page explaining to website owners how to do this?

That’s up to the user. If it was the wrong choice and didn’t work for them then they wouldn’t be using Cloudflare to protect that service. It is perfectly possible to run APIs and RSS feeds from a Cloudflare protected site, and just because you can’t get in, that doesn’t mean it doesn’t work otherwise those feeds you are trying to reach wouldn’t be protected by Cloudflare in the first place.

If the website owner found they were over-blocking access to their API or RSS feed, they would set their rules to allow what they needed to or not use Cloudflare at all.

Maybe they don’t use Cloudflare. Maybe they do and they have allowed those large readers access because it is in their interest to do so.

At the end of the day, the point is that Cloudflare is not blocking you. Cloudflare users are blocking you. It may be deliberate, or their rules are too restrictive and overblocking, but it’s obviously not significant enough to cause them concern or they would notice and fix it or not use Cloudflare.

The only way to get access is to contact them and ask to be let in. Cloudflare wouldn’t be of any use if you can just force your way in against the wishes of the site owner.

It’s easy to do, they just exclude the specific IP address, URL or other parameter from the rule that challenges the ASN [or skip the bot protection or whatever is blocking you]. Send them the Ray ID and they can look and see the reason for the block/challenge and if they want to let you in, they can make an exception for your IP, user agent or other identifier.

Sorry to ask for a third time, but is there a web-page that explains this for the Cloudflare user which I can link to?

They should know if they are using it, but otherwise…

3 Likes

When you are blocked by a Cloudflare product, you are given a CF-Ray-ID in the response.

The Website owner can check the Ray-ID on https://dash.cloudflare.com/?to=/:account/:zone/security/events to see exactly why you were blocked.

How to allow you access then depends on what feature blocked you in the first place and on what Cloudflare plan the website is, and how much access the website owner wants to grant you.

Cloudflare security is fairly complex and will require the website operators to read through the documentation that @sjr linked and experiment some if they want to fine-tune it.

2 Likes

Really helpful, thank you.

The plan is to highlight on podcast pages - https://podnews.net/podcast/i4mys - where Cloudflare has blocked access (which I can do programmatically), and give podcast producers the documentation to help them fix security for the RSS feed.

It also highlights to me why some in podcasting have suggested that Cloudflare’s default settings are a bad choice for hosting podcast websites - or, indeed, any form of website that has an RSS feed or any APIs. I didn’t fully understand it in the past, but I certainly do now.

1 Like

Seems like I somehow failed with editing my post earlier, to also mention the challenge you may be having with connecting to websites that you do not own and control, from IP space by various hosting / cloud providers.

Thanks to @sjr for adding this, which is indeed indeed a well known practice from website owners!

Is this the exact URL you’re seeing issues on? Because that one is not on Cloudflare.

The RSS link a the top right, https://www.thisamericanlife.org/podcast/rss.xml is at the moment behind Cloudflare though.

If you move on with:

https://www.thisamericanlife.org/podcast/rss.xml

Through at machine at OVH, a well known hosting provider that is also well known to be blocked (or challenged) here and there, just like Amazon, and the others that @sjr mentioned above, my result is a HTTP/2 200 OK status:

$ curl -s -D - -o /dev/null https://www.thisamericanlife.org/podcast/rss.xml
HTTP/2 200
[...]

So assuming that is the exact URL you’re having problems with, it seems like they have been a bit selective, and have blocked (at least) Amazon, perhaps due to the illegitimate traffic they have seen from them.

OVH doesn’t seem to be blocked, from that URL, at least not at this time.

Again, and as mentioned above, Cloudflare did NOT block you.

The website owner did.

If the default settings do not match the website owner’s wishes, and the website owner doesn’t adjust them to fit their alleged wishes, that part would also be something you need to complain to the website owner about.

1 Like

Thanks. Really not looking for an argument here.

No, This American Life doesn’t block EC2 instances from accessing its RSS feed. That’s an example of the service I have, where we are accessing a number of different APIs and RSS feeds. There are over four million podcasts out there.

I’m increasingly discovering some website owners who have turned on a specific blocking pattern which has, in turn, blocked my server and many others from reading APIs and RSS feeds. It’s clear that this blocking pattern is incompatible with hosting a podcast.

I’d like to help podcast publishers understand any problems with their podcast - and blocking the RSS feed from view by anyone hosted on, say, EC2 would seem to be an obvious mistake.

Even if my wordings, or at times, quite direct tone, may have given a such impression, that I was looking for something like that, I can also assure you that it was NOT the goal from my end either.

An URL where you see the problem could make it possible for someone else to test if here seems to be a consistent pattern among different hosting providers, but not residential/business providers, and so on.

That may be quite relative, and may depend on the individual view of each individual person, each podcast (or RSS feed) publishers, and so on though.

I don’t disagree that there may be situations pointing in either direction of it (e.g. both being perfectly fine, but also being an obvious mistake, to use your words).

From Cloudflare, I can add a “Security Level”, which, IIRC, by default is set to “Medium”, that quite much works globally for my zone:

https://dash.cloudflare.com/?to=/:account/:zone/security/settings

Setting your website to e.g. “I’m Under Attack” will literally give everyone a challenge page when they visit the website.

The other levels may eventually give a challenge page, depending on what kind of that has been seen from the traffic source, such as e.g. “High” that mentions “within the last 14 days”.

Below that one, I can also enable (or disable) a “Browser Integrity Check”:

My personal suggestions to the website owner of something like that would be:

  1. Set “Security Level” to “Essentially Off

  2. Disable “Browser Integrity Check”.

  3. Deal individually with bad traffic, such as e.g. giving Amazon (or other individual, but abusive traffic sources) a challenge or block page when necessary.

This one definitely sounds like you’re seeing a such challenge page.

So under the condition that the website owner actually feels fine with your traffic, and feels fine with the fact that it comes from Amazon/EC2, the likely options are:

  1. Website owner is using a too high “Security Level” setting, according to their (alleged) wishes, and needs to adjust that.

  2. Your IP address, or maybe one of the closest neighbours, have shown threatening behaviour, that it didn’t do before, and due to this threatening behaviour, is now incompatible with the website owner’s “Security Level”.

  3. Website owner has “Browser Integrity Check” enabled, and your requests are flagged, as a result of this.

  4. The website is purposefully challenging various hosting providers, perhaps due to (past) negative behaviour from them, as mentioned above, which unfortunately also hits you, e.g. you’re being collateral damage to the situation.

The CF-RAY ID, that you mentioned above, can be used by the website owner, to search for the actual event that caused the issue.

As a test / demonstration for another thread, I added a rule to block empty User-Agent’s on one of my quite unused domains, a rule that I didn’t clear out, and which was blocking someone roughly 22 hours ago, from Google’s Cloud Platform, digging up the details based on the Ray ID, looks like this:

However, roughly half an hour before, someone (or something) on Amazon did actually trigger the “Browser Integrity Check”, which on the “Firewall Events” looks like this:

Attempting to access /app/.git/config on this specific domain would be quite strange, as it has never been related to “git” in any direction.

The kind of path that it is requesting, also sounds to me like someone (or something, … likely automated), has shown the interest in looking and seeing if I have misconfigured something from my end, and left configuration files publicly available.

My personal view? I wouldn’t say that this specific event that I see from my end, should be the sole reason to say that the entire Amazon EC2 (or whatever product) is a hostile place, which must always be blocked, or something like that.

On the other hand, if I received 100M of these queries a day, from more than 250K different IP addresses, all hiding behind AS14618, but no otherwise clear signals on who they are from, or what they actually are?

People say that time is money, - so … I guess the majority would :point_down:

:thinking:

That would simply be the least time (and money) consuming way to mitigate the problem with bad traffic originating e.g. from Amazon’s AS14618, once and for all.

As mentioned above, there are a couple of ways with the website owner’s decisions (or, perhaps even lack of same), that could cause this.

I hope that the above explanations will be contributing to the full understanding of Cloudflare and it’s capabilities.

If you feel for it, you’re also welcome to link to this post.

2 Likes

This is great, thank you. I’d like to write an article explaining how to do all of this. It’s less about me, and more about “Hosting your podcast’s website on Cloudflare? Here’s how to make sure that your RSS feed is visible to everyone.”

Most people won’t want to turn off the Cloudflare protection for their entire site just because there’s an RSS feed in it. (I certainly don’t, with Cloudfront!) So, I assume that it’s possible with Cloudflare to create a configuration rule for a specific URL to be “Essentially Off”.

So, for my website, if I were a Cloudflare user, I could have proper protection for https://podnews.net/ but “Essentially Off” for https://podnews.net/rss.

Indeed, the top screenshot mentions this, for Page Rules. Although, sigh, the Cloudflare documentation at https://developers.cloudflare.com/rules/page-rules/ says “don’t use these, they’re going away” (isn’t this always the way with web documentation!) But, following the links, it looks like if you wanted to set /rss/ to be “Essentially Off” and to disable “Browser Integrity Check”, then
https://developers.cloudflare.com/rules/configuration-rules/ is the way to go?

This would be the “right” thing to do - to my understanding: retain the proper security on a website, but relax the security specifically for the RSS feed, which is built to be repeatedly requested by a number of bots. Indeed, if you’re interested, https://podnews.net/about/rss-stats shows how often it gets hit and by whom.

(I’m surprised that there isn’t a guide already on Cloudflare helping you relax the security on an RSS feed; but if there is, I can’t find it!)

Disabling it all or running completely without wasn’t my suggestion either.

Alone by using Cloudflare in front of your site, with “Essentially Off” , you would already be protected in many kind of ways.

Essentially Off” would still be challenging (or blocking) the most grievous offenders, as the explanation tells.

With “Essentially Off”, as mentioned above, the goal would be that you would have an always reachable website, API, RSS feed, … you name it.

BUT

You found out that there is some silly person (or system) out there, which is bombarding your RSS feed with queries.

You simply have to do something to keep your RSS feed alive, and the source(s) of the nasty traffic you see, is actually is limited to two cloud / hosting providers, - Digital Ocean (AS14061), and Hetzner (AS24940).

As mentioned above:

Simply add a WAF rule on Cloudflare to block access from these two AS numbers.

It would be some sort of a “Generally allow everything, but selectively block nasty traffic” way

True, you can also do it in the reverse direction, by having a strict “Security Level”, but relaxing it selectively using the “Configuration Rules”.

:+1:

Configuration Rules” will indeed be the way to go, if you for example want to keep the current (or default) security level, but selectively select something else, such as for one or more individual host names.

With example.com, you could also create rss.example.com, so that all RSS feeds are somewhere on https://rss.example.com, and that way exclude the (default/zone wide) security policy for e.g. rss.example.com.

The “right” thing, or way, can have a lot of different answers. :slight_smile:

To go that way, true.

Depending on what your actual goal is, things can go both ways.

In other words, the specific use case may pretty much be changing whether it is best to be inclusive (e.g. having the the least amount of security policies needed), or to be exclusive (e.g. having the the maximum amount of security policies, even if they might not be needed).

I guess we’re down to the one from above:

If I have a problem, and that specific problem magically vanished, by putting Cloudflare in front of my website (even with Cloudflare’s default configuration)…

When I see that, why would I ever be spending more time, money, resources, … whatever, on relaxing the security policy, … maybe for you alone?

Unless I’m losing hefty amounts of money, like millions to billions of dollars, if I’m not doing anything about it, … then why should I bother at all?

I do (unfortunately(!)) see that kind of attitude a lot, and it goes all the way from personal / hobby projects, and up to the very largest kind of deployments you see out there.

It is my personal impression to the reason you’re not finding any “relaxing the security” kind of guides, should most likely be found there.

Thanks for all this.

I’d still be keen to understand whether the configuration rules can be set for a specific URL or URL pattern on the existing website. I’m assuming they can, but the detail is quite scant without having a Cloudflare account.

why would I ever be spending more time , money , resources , … whatever, on relaxing the security policy, … maybe for you alone?"

I can answer that: if you are a podcast publisher, you actively WANT as many people reading your RSS feed as possible, and a denied call for an RSS feed directly leads to fewer listens and less money. At no point would any podcast publisher be doing it “for me alone”.

Perhaps in order to help others with this, I need to sign up somehow and get a Cloudflare account.

They can, this is a bit more obvious in the actual UI but the relevant documentation would be