Websockets disconnecting after 20s

What’s wrong

I have an application that has worked for many months over tunnels and as of today, it unusable (when used over Zero trust tunnels). When working directly, it works fine.

What I have

I have Python running a SocketIO server on one side, then Vue.js with socket.io on the other. This is a tried and tested combination that I’ve been using for many years.

Symptoms

Client becomes unavailable after 20-30 seconds. When setting a debug on the server side ping-pong, packets go back and fore for 20s or so, then no response is received, so after a timeout, the server closes the connection.

What I’ve done

I’ve spent all day experimenting with different ping pong’s, timers, keepalives, tweaking tunnel settings, nothing seems to have any effect. This has broken both the code I’ve been developing, and the unchanged code that was on test.

Help!

I’ve seen many other websocket issues listed over the years but nothing I’ve seen seems relevant. I’m now at a loss. If anyone can help or has any ideas they would be much appreciated.

Ok, another round of testing and I’m almost left speechless. Almost.

What I did

I’ve set up a virtual server running a tinc tunnel and NGINX and connected all my servers up to it in such a way that it mirrors what I had set up on CF ZT tunnels. I then closed down the CF tunnels and re-pointed the DNS entries as C-NAME at the new servers. Fundamentally the software kicks off and for the first 20s looked good.

What I found …

  • When routing via CF with proxying “on”, I have the same problem with websocket connections dropping after 20 seconds
  • When I turn proxying off, so it “should” (?!) just be pointing the DNS, same problem (!)
  • If I point the DNS from my local hosts file, hey presto!, all works just fine.

Interestingly, despite setting up the virtual server with an IPv4 address and pointing the server A record to this address, then pointing the C-NAME to this A record, when I “ping” the C-NAME it comes back with an IPv6 address. Wondering if this websockets issue is IPv6 related ???

The problem …

Just using a C-NAME (or I’m guessing A record) with CF seems to kick off the problem, so whatever is causing the issue actually seems to be in the CF proxying layer rather than nearer the tunnels. Turning proxying off on the DNS entry not working it a bit of a worry with regards to the rest of my CF infrastructure.

So, as long as the traffic doesn’t go near CF it seems to work, but this is not a good solution for me (!) Any help of advice would be much appreciated, at this point it seems I can’t even use the CF DNS service without it being an issue for websocket connections… :frowning:

Note:

I have some “other” software that I recently switched over to a CF tunnel. It too uses websockets and since switching it’s been doing some “odd” things. I rather suspect that when I start digging I’m going to find it’s been doing strange things becuase it keeps connecting and disconnecting it’s websockets (!)

… whay do I hypenate C-NAME? because Discource in it’s wisdom thinks (for some reason) that without the hyphen it’s a link, and I can only put 4 links in a post … I’m now going to find a solid desk that won’t mind a few dents.

Ok, so using CF DNS with proxy turned off does work Ok, I had problems with my DNS cache not clearing when requested. It breaks as soon as I turn proxying on.

So …

I’ve been back over websocket based applications I’ve had up for quite some time (1y+?) and have found some (most) seem to be subject to the issue. The only difference I can see is that the ones that still work, continuously put traffic over the link, so for example there is an application level update every 5 seconds. Anything that seems to rely on a socket.io ping-pong looks to be timing out after 20 seconds.

Most of the problematic stuff I’ve now moved is apparently running happily without any socket disconnection issues.

I’m at a bit of a loss to understand why it’s “just me”, I’m wondering if there is a subtle setting somewhere in the control panel that I’ve inadvertently tweaked without reading the fine print that says “beware, this could break all your websockets”, or whether something has changed in the Matrix … or whether I’m just missing something really obvious.

Any help or advice would be much appreciated.