Need Help Understanding Argo traffic routing patterns

Hi Team,
We enabled Argo on our account yesterday to improve our routing with Cloudflare. We sent more than 150k requests through ARGO to measure the latency improvements. Argo performed well enough for us once we had sent enough amount of requests through it. Argo optimised the COLO selection for our traffic and latency between POPs and Origin. However, upon going through the latency patterns, we had questions about it to better understand its traffic routing patterns.

Below is our situational context:

  • Our origin is in India Mumbai
  • Our customers are located around the world but mostly in India and African countries
  • testing was done from machines in India using both broadband and Mobile traffic.

Below are our questions:

  • At the beginning after Argo was enabled, we did not see immediate traffic optimisation and it took a while for Argo to start working. How much amount of min traffic is required for Argo to start smartly routing the traffic? In other words is there any warm period? If yes, what is that?
  • Argo selected the correct COLO for us in most cases but there were instances where it picked up far away COLOs and in cases, it also sent traffic outside of India only to route it back. While running these tests we were on the business plan so we expect CF to pick the best COLO in either case. This in part increased our P95 and P99 latencies. How does Argo pick up the COLO and in which cases it will pick up the farther COLO?
  • When running tests from different locations in India we saw that whenever ARGO picked up a new COLO, initially traffic was slow and it gradually improved as more traffic went through it. We are assuming that as more and more traffic for the domain starts going through the same colo, Argo will eventually figure out the improved path to reach to origin. So, we need to warm it up to start seeing improvements. Is this a correct understanding?
  • On the Dashboard Argo showed that 54% of our traffic was smartly routed. What does this mean? Does it mean that only 54% of traffic saw better latencies and others did not? Is there any selection mechanism that argo uses to smartly route the traffic?
  • In the geography tab, Argo showed that for locations, it improved traffic by over 80%. What does this mean? Again does it mean that enough traffic went through that COLO after enabling ARGO and it improved the origin latencies by 80%
  • If we use Argo with the PRO plan, will it impact the colo selection and latencies between COLO to Origin compared to using Argo with business plan?

It depends.

ITraffic can’t be optimized until a percentage of it is routed through various colos and results are compared. Over time changes to performance will result in additional changes to patter traffic to determine if more optimal patters occur. How much traffic that takes depends on algorythms you don’t have access to and are dependent on factors (such as the amount of traffic to that subnet from other zones) which are outside your control.

The closest colo is not always the fastest.

Argo can’t oprimize without data. Data can’t be gathered without experimentation. Experimentation requires determining best through the accumulation of data where some will have not been ‘the best’.

If you are running so little data that the experimentation of argo has a meaningful impact on P95 and P99 latencies over time (not just a short test with dummy data) then a. there isn’t enough traffic in that location to matter or b. the network is extremely wonky and continued adjustments to improve performance are required.

Insufficient data. Argo has all kinds of algorythms and logic gates configured for it. Data was routed. Over time Argo gathered sufficient data to make decisions around ooptimal routing. Argo continued to collect data and continues to make updates to it’s routing decisions.

Argo evaluates routes looking for faster ones. If there isn’t a faster route there’s nothing to smart route. Traffic not smart routed uses the default route (generally directly over the clinet’s colo’s internet connection(s).

Geography: A map shows the improvement in response time at each Cloudflare data center.

No. It may however change the colos available to advertise the zone.

@cscharff Thanks for the quick response on this.

After enabling ARGO we sent more then 150K requests in 24 hours. However most of these were from single IP(My home machine) and from single ISP. So based on this response, Is it safe to assume that if we were to enable this in our prod, ARGO will start optimising traffic much quicker through out the india since we will be getting traffic flowing through most COLOs at quicker pace?

I had a question regrading data transfer since that’s how ARGO is charged. On the traffic page, I can see that in last 3 days it shows data transfer as 89.41 MB which could be true since we are only hitting a URL that sends back 200 Okay response as testing and apart from this none of our domains are proxied. However, on the billable usage page ARGO traffic is showing up as 257.34 MB. We enabled ARGO only for 24 hours. So why ARGO usage is such high compared to metrics on Traffic page and which one is true and should be followed? Just to Note here that, we have completely disabled caching and not really looking to use it even for our prod traffic.

on DDOS front, it said when enabling ARGO that traffic in DDOS will not be counted. How will that work? Let’s say If I am under DDOS attack, will I need to inform CF through some means or CF will automatically detect that we are under DDOS and will not charge extra data transfer?

I’ve only tracked the enablement performance of Argo with Cloudflare Enterprise customers. In those scenarios the customers and I had reviewed the existing performance of the domain(s) in question, their origin placement and visitor traffic to determine that Argo was worth consideration as a production feature. In those scenarios we typically saw that 6 to 8 hours of global traffic with Argo enabled was sufficient to gather a good baseline of performance improvements. Over the course of a few weeks or months checks on performance were generally in line with that initial baseline, but the vagaries (unique challenges) of routing over the internet would sometimes result in ‘interesting’ variances where one or more colos might see a marked increase (or decrease) in smart routed traffic from the initial baseline as BGP routing table changes, internet outages and network performance issues happen and Argo continually evaluates a portion of traffic through alternate routes looking for changes which might cause it to shift traffic based on performance.

Argo is billed based on Argo traffic metrics. You can monitor the usage. Additional information from folks who use this on a pay-go plan are in threads like this one which might be helpful: Argo Billing Usage Notification "Notify when total bytes of traffic exceeds" - #6 by sdayman

DDoS traffic is typically determined to be non-blocked traffic. So fules which block requests configured in the dash or based on Cloudflare DDoS rules would not be counted.

Without knowing the use case… :person_shrugging: Generally speaking the complete disablement of caching is not conducive to maximizing end user perception of performance. But if you believe caching is not necessary optimizing the connectivity to the origin where possible does seem to be a potentially important feature.

This topic was automatically closed 15 days after the last reply. New replies are no longer allowed.