DDOSd yesterday - site temporarily down for some users this morning - possible 429?

Hi Hive Mind - hoping you can help shed some light.

Yesterday our site was hit with a DDOS attack (http flood) that Cloudflare mitigated in it’s normal fashion.
None of our users reported any issues and our own logs suggested that for the 5 minutes that some of the attack traffic got through, our site performance was only slightly impacted (avg. response time went from 0.5s to 4.5s)

However, this morning I’ve had two separate reports of the site being down (either completely unavailable or ajax calls reporting that they were unable to retrieve data).

We use the site ourselves and noticed no outage and there are no obvious alerts or errors in our logs.

One of these reports was from a London customer, the other location is unknown.

Looking at the Cloudflare traffic analysis, I can see I have two spikes of 429 Edge status codes around the time the outage was reported with only two ASNs affected - both of which are London based.

I’m on a Pro plan and the volume of the 429 responses in both spikes were sub 1k.
We have no rate limiting in place.

The total number of requests across the entire site for the 1 minute period of the larger spike was under 5k and the second smaller spike was around 3.5k
The site is an educational resource site on a school day with 10k different users a day accessing it globally so I’m not convinced that all that traffic came from a single user (on the off chance it triggered Cloudflare’s inbuilt rate limit - whatever that may be).

So three questions:

  1. Am I correct in thinking that “Edge status codes” originate from Cloudflare rather than our origin servers? No 429 responses from Origin have been logged.

  2. Does it seem reasonable to assume that the 429 spikes were the cause of an outage that seems to be specific to localised users?

  3. What is the likely cause? Could this be down to increased sensitivity following yesterday’s DDOS, just normal Cloudflare behaviour (slightly worried if this is the case), another DDOS that’s not triggering the DDOS alert, or something more sinister?

  1. “Edge status codes” originates from Cloudflare. However, you also need to check Origin status code on your Analytics tab. A 4xx Client Error is a client side error that means that the client sent something the origin was unable to process. Cloudflare does not generate any 4xx error code, so this would indicate something is not configured correctly with your hosting provider or your client is sending something incorrect.

  2. You can check on your Analytcis tab to see if it happens to various countries or only some sperific countries.

  3. 429 means Too Many Requests.
    Client has sent too many requests in the specified amount of time according to the server. Often known as “rate-limiting”. Server may respond with information allowing the requester to retry after a specific period of time.
    This can be a rate limiting response from your origin to the attacks.
    You can read more at 4xx Client Error · Cloudflare Support docs.

Hi - than you for you response.
If you check my original post though, you will see that I already checked the origin responses and there were no 429s coming from Origin.
I also stated that there were only two ASNs involved and both were London based with all involved traffic coming from the UK (I can also verify that one of the customers who reported the outage was from London).

Looking at the Cloudflare documentation you linked to, it suggest several scenario where Cloudflare does generate a 429 itself - and given this was an isolated incident affecting only a few customer so far as I can tell I am unconvinced it’s an origin configuration issue as I would expect that to have a wider impact.

I could accept that it may be a hang-over from the Origin (Azure) to the DDOS (although it was around 24hours after the attack finished) if it weren’t for that fact that there were no 429s from the Origin recorded in Cloudflare logs.

In the docs you linked to under the 429 section it states that:

“The global rate limit for the Cloudflare API is 1200 requests per five minutes per user, and applies cumulatively regardless of whether the request is made via the dashboard, API key, or API token.”

What type of traffic does this cover? I’m assuming it doesn’t affect proxied requests to the origin but does it include actions that may occur before a proxied request is allowed through?

Further investigations show that all the blocked requests were to /cdn-cgi/rum
Given that this would appear to be cloudflare’s analytics - it feels odd that these are being rate-limited internally in the first place (I have no rate limiting set) but I feel now that it may not be directly related to the reported outage - although possibly a symptom of it.
I am assuming that if requests to /cdn-cgi/rum fail, it will not cause the proxied requests from clients to fail?

This topic was automatically closed 15 days after the last reply. New replies are no longer allowed.