Argo Tunnels Drop Connections more than Direct IP

Hi Everyone

I’m fully willing to admit that I may have misconfigured something, however I’ve been experimenting with the use of Argo tunnels and can’t say I’m impressed with the stability.

The setup is simply
CloudFlare <—Argo Tunnel—> Non Exposed Host
and on this host I’m running 3 Argo tunnels for ssh, a Kubernetes dashboard and another tunnel pointing to an nginx ingress. Note that this is paired with Cloudflare Access so its not like this stuff is exposed to the public internet. Behind the nginx ingress (running inside a single node kube cluster via microk8s) is a web based version of VS Code that leverages websockets.

The issue I’m experiencing is that Argo appears to have issues maintaining persistent connections for any length of time (>15 minutes). In the case of both SSH and the websockets used by VS Code, I will encounter dropped connections and on the occasions I have both open, they both drop simultaneously. Checking CPU load on the host, it is quite reasonable and I’ve made cloudflared higher priority than most daemons (particularly kube related daemons).

If I instead expose the host to the internet with firewall rules restricting connectivity to only Cloudlfare IP ranges, there are absolutely no connection drops. To clarify, in this case connections are still being proxied via Cloudflare and secured via Access, however Argo is being bypassed. I originally assumed that these were keep-alive related issues where CF were dropping the idling connections, however the fact that non-Argo connectivity appears fine makes be suspect something else.

For additional context, the host in question exists within Google Cloud running on the standard (cheaper) networking tier which drops traffic onto the public internet ASAP (instead of Google carrying it as far as possible).

Any tips to debug this further or more information I can provide, I’m all ears.

1 Like

Have opened support ticket #1782571 for any Cloudflare employees interested.

1 Like

So as an update, the ticket has been opened since November 12 and essentially every time I’ve sent a reply, I’ve received a response from a different person largely asking me for the same basic information/explanations of the problem and log files showing the same story. I’m up to 6 different support engineers in this one ticket.

I appreciate providing support for this sort of issue can be tricky having been in such a position myself, but at the very least it would be nice to not have to repeat myself.

That seems strange… @cloonan can you do something here? Ticket number is:

Got it. Looks likes it’s parked with an engineer at the moment awaiting a TBD release. I’ve cc’d myself on the ticket to ensure I see the updates.


Oh I see, I can also feel for @dmf… Once there is a waiting period for a release the updates tend to be an issue since not everyone is informed and/or they miss giving an update. Maybe @otto can do something about it, it’s not the worst, but it can be improved without too much trouble I hope :slight_smile:

1 Like

Just dropping an update in here for anyone following along. This ticket is still in a holding pattern while an “infrastructure change” is worked on. At this point I have reverted from using Argo tunnels and am instead using firewall rules set to drop anything that isn’t a Cloudflare IP range while I wait for progress.

Had some progress in the form of a response:

“I just wanted to check in and let you know engineering has resolved the issue. I will go ahead and close this ticket…”

Gave it a shot but whatever the issue is, it isn’t resolved for me and if anything is worse. Back to using direct IP I go.

Happy 2020!

As an update for anyone following along, this ticket is still open with support with minimal progress other than my problem being part of a larger issue and “Due to the time needed to address the overarching issue, we won’t be able to provide further updates until later into the first quarter of 2020.”

My primary concern at this point is that they will resolve the undisclosed overarching problem only for it to not resolve the issue.