Where do I begin to troubleshoot cloudflared cpu usage?

I have been using Cloudflare tunnels to serve a web site from a Windows IIS site. This site has been up for a few months and is working well. However, beginning last week, my cloudflared process is using consistently 20% cpu on a 4 vcpu server. The site is serving roughly 20GB of traffic per day. Cloudflared has historically used 5-10% cpu and now it is using 15-25% cpu. I’m curious to know what affects cpu usage of that process and where I should begin to troublleshoot what caused this increase over the last week. The increase cpu cannot be tied to an increase in traffic. It is quite the opposite. Approx a week ago, total traffic to my site decreased 10-20% at the same time my cpu increased 10-20%.

Thanks for any help anyone can provide!

I have an idea that I can check for on our end: can you share your Tunnel ID?

2 Likes

Hi Nuno,

One of my Tunnel ID’s is xxxxxxxxxxxxxxxxxxxx.

I actually have three servers in a Pool that are experiencing the same issue.

Thanks!

-Matt

I’ve confirmed my theory: your Tunnel was automatically enrolled into using protocol: quic (transparently) roughly 1 week ago.

It is not surprising that QUIC-based proxying uses a bit more CPU, since the congestion control, reliable delivery, etc… all happen in user land, whereas plain old HTTP1/2 delegates a lot of that to the TCP netstack from the kernel.

3 Likes

Interesting! I did notice the connection error last week when I didn’t have the quic protocol opened outbound from my web servers. I opened the protocol and the tunnel started successfully.

For my deeper understanding, Is there any somewhere I can read about these differences?

Thanks!

-Matt

Why We Love QUIC and HTTP/3 | Hacker News seems to be pretty good discussion around the differences.

No offense, but your and my understanding of a ‘pretty good discussion’ are very different. :wink:

This discussion thoroughly explains how expensive quic is in terms of cpu usage and packet exchange. All of which is above my head and probably 99% of Server/Network admins.

I think I will look into configuring my tunnels to not use quic. Otherwise I have to justify to my boss why we are spending 50% more CPU on these servers.

https://www.fastly.com/blog/measuring-quic-vs-tcp-computational-efficiency might be a better resource - it goes a lot more into the different optimizations or ways that it can be offloaded onto the kernel more. It also has the fancy diagrams that management love :^)

The general idea is that the usual HTTP over TCP we’re used to has had a long time to be optimized, hardware accelerated and deeply integrated into kernels whereas QUIC is a relatively new technology that hasn’t had as long to get there yet.

1 Like