Argo Sudden QoS degradation


#1

A sudden QoS degradation, Argo performance goes worst than direct.

I have an Argo Tunnel in place and several subdomains, over last month Argo normal response, has been 65-74ms now 240ms.

To minimise this issue I needed to selectively switch between on/off Argo on a subdomain basis.

Also, will you still charge for lousy performance?

This terrible performance appears related to LHR datacenter.


#2

Yeah… so counting is hard and displaying meaningful data is sometimes harder.

So 64.7% of your traffic in this graphic above was routed through Argo primarily because we believed we had a faster route. A small percentage of traffic is routed through our network to try and determine if there is, in fact, a faster route and sometimes that traffic will be slower than it might otherwise have been. But we only sample a small percentage.

I’m not sure which of your domains this is, but on one .co domain you have I see similar stats so I’ll talk to it generally… In certain colos like Prague you see a large circle meaning we sent a lot of traffic there through Argo (and in that case the traffic was seeing an average improvement of 18%). In Brussels, there was (probably less traffic so we routed less) but the traffic saw a 29% improvement.

In one colo as you note we saw that the small percentage of traffic routed through argo to test for a faster route actually behaved abysmally, in the chart i’m looking at it was fewer than 600 request but they performed over 3000% worse.

When you add up all the requests and percentage improvement or loss you get back an odd number because statistics and percentages are hard to represent well… and you wind up with an average improvement of -105%. But really that is hugely biased by 600 or so requests which performed really poorly.

The bar chart does a somewhat better job of representing the percentage of requests where Argo improved performance vs. going direct to origin.

The stats themselves aren’t wrong, but are wildly influenced by a small number of outliers with exceptionally poor performance vs. a much larger number of requests which saw more modest performance gains.


#3

Thanks for your quick reply. It is a huge disparity between datacenters performance, my servers that use Argo are in Frankfurt. Between our servers in London and Frankfurt, latency averages 17.7ms.

I am also seeing a high number of HTTP/1.1 502 Bad Gateway at different test locations.


#4

502 errors could be origin related or Cloudflare related, information on troubleshooting is available here and is best handled in a support ticket. There is no good route between Brazil or Australia to the EU, so while those numbers are a little high, there are a LOT of hops between the POPs in those regions and the origin server.


#5

QoS is now up to standard. :+1: