Hi all, i’ve been monitoring the Cloudflare status page since the 8th of December 2019 (137 days ago as of now), and i’ve collected some interesting data regarding the uptimes of colos.
The PoP with the least downtime is Brisbane, QLD, Australia, sitting at 55.2% uptime (seriously?!)
On a more positive note, here are some PoPs with outstanding uptimes:
I have a Python script that parses the Cloudflare status page every 10 seconds and takes note of any changes, I then parsed the logs and formatted a nice Excel document from this. More specifically, I count downtime as the total time a COLO is spent as “re-routed”, I do NOT count “Degraded performance” as an outage.
This leads me to my main question, why are there 28 COLOS with <90% uptime? What is happening at these locations? Also, what the ■■■■ is happening in Brisbane? 61 days spent offline?
(Here are my logs, I began parsing at 2019-12-09 due to timestamp change)
Please let me know if you find any errors or inconsistencies in my data
But this is not measuring the actual uptime of a particular datacentre. It just aggregates the publicly available information on outages, right?
It is difficult to say how accurate that is then. For example, today there was a brief apparent outage of a PoP in my vicinity, which is not listed on the status page at all.
It will be listed on the status page, as far as I can tell it is completely automated,
specifically this part:
My script scrapes this page once every 10 seconds
It aggregates publicly available information on the status we report of our POPs as defined at the airport code level.
a. Cloudflare reports more transparently than any other company I’ve every worked for.
b. A given city in many instances includes multiple datacenters.
c. Cloudflare is the most interconnected company on the planet (over 8,000 peering connections… more than AWS, Google or Netflix) and the internet is fragile… with 8k interconnects a broken link in a datacenter could lead to a degraded status.
d. Cloudflare runs an anycast network. If a datacenter was truly unavailable, traffic is routed to another colo. So a colo’s uptime <> service uptime.
e. Our criteria for degraded or rerouted are our own. If an ENT customer in OZ was paying us to deliver Aussie rules football streaming and we decided to stop delivering traffic for a portion of pay as you go customers out of the Sydney PoP during a popular match to ensure we had capacity for the football game we might report that DC as degraded (hypothetical… I don’t make the decisions but it seems perfectly reasonable to me that we might choose to do that).
… and I guess, not all infrastructure across the world is equally reliable/ available. Power outages, network failures and ‘other’ challenges can exist in greater numbers in some areas. Not deploying a colo there could pad an appearance of uptime/stability, but the value of Cloudflare increases for it’s customers the more we expand our network and the closer we can locate colos to the end users who consume the service(s).
Precisely, it can only be as accurate as the status page is.
Which is pretty damn accurate, when a PoP shows as re-routed then no traffic whatsoever will reach the PoP. I know personally because my local PoP is one of the ones that is always offline
Dont really want to comment on that. I can only refer to the outage I experienced myself today and that wasnt listed
Their PoP statuses are automated
The question is the threshold
This topic was automatically closed after 14 days. New replies are no longer allowed.