Load Balancer / Argo Tunnel

Odd issue here with the CF Load Balancer and Argo Tunnels.

(Servers all serving HTTPS interface)

I’ve got two servers, api-server-1, api-server-2, I can load balance them fine, lets call the load balancer LB1.

Direct access via their ‘A record’ works fine.
Access via the load balancer, allocated 50/50 works fine.

They have CF certs on them for HTTPS.

Now I’m experimenting using Argo Tunnels / Cloudflared to route some docker containers. The plan is to purchase more load balancer origins and use Argo Tunnels to cope with extra demand.

The two docker containers are on the same physical server, https, on different ports.
Both have valid CF SSL certs.
Both containers have a CNAME record (created via cloudflared), lets call them argtun1, argtun2.
I can access these containers via their CNAME records completely fine.
I can script a curl request to the CNAME record in a tight loop and they always respond fine, all over https, no errors at all.

But when I try and put one of these servers into the LB pool is where things start to break down.

If I do api-server-1 and argtun1 in the pool, I start to get strange errors which I cannot track down but it also works sometimes.

If I refresh https://LB1 I can see it switching servers 50/50. I can also see at the origins the LB health check requests coming in with a 200 response.

Every so often, I start to get a 403, cf-ray id 67a0cf94cd972c5a-LHR , 67a0d222ca79f3df-LHR , or even weirder 404, 67a0d285ae20f3df-LHR .

Now, if I disable api-server-1 in the pool leaving just argtun1 in the pool, it stops working completely, generally with 404, 67a0d4fe7e61e600-LHR , and I assume it doesn’t help for some unknown reason it has switched to caching the error response which is now cache HIT for the get request to LB1 , even though the max-age is always 0 for URI / .

Re-enable api-server-1 , eventually I start getting the 403 errors again, 67a0d9bd6e0053fe-LHR and some more back and forth with 404.

BUT if I go into the LB1 and turn argtun1 off, save, turn it on, save . Wait for the first health check to succeed. It mostly goes back to the working state, throwing in some 67a0dd313a6fbc06-LHR errors here and there.

My LB1 health checks are to URI /, this always returns 200 (I can see the origin logs).

Accessing / via LB1 is returning errors, surely then the origin which served it should be taken out of the pool if this was truly the case?

I have no firewall events matching the ray ids.

What is going on? It makes no sense?

Thanks,

Jon

Cloudflared config.

ingress:

Hi

Cloudflared have reach logging, just enable it and you will se what is going on.

Regards,
Mirosław {redacted}

Great - didn’t see that option, thanks!

OK I can see the health checks coming in fine, but, if I refresh a few times, I get the 403 error, no log appears though from cloudflared. Even though this is happening, I can see the health checks coming in fine. I’ve tried now the Linux client and the Mac OSX client.

Running cloudflared with loglevel and transport-loglevel set to trace. There is no log output when I start getting the 403 errors.

Edit: Looks like an nginx error, I’m not running nginx though.

I realise support said wait 72 hours, but due to the random nature of this issue, the lack of user side logs available, I can’t see how a community member can solve this without access to the ray-ids .

#prayingitisnotme

@MoreHelp

This is unusual - are you still seeing this error?

Hi Simon,

Yes still having issues. I have support ticket open which is 2223978 .

After trying different things, the issue is that when an Argo tunnel is in the load balancer as an origin, the ‘Per origin Host header override’ has no effect, which results in the ingress rule not matching in Cloudflared.conf, resulting in a 404.

EDIT: The Monitor host header override works though.

I have checked and can confirm that for a ‘normal’ / non argo tunneled CNAME as an origin, the host header override works as expected.

Accessing an argo tunnel directly via the CNAME (not load balancer) directly works as expected.

I have no idea where the 403 errors came from, but I’ve not seen them since, at least this problem is consistent right now, previously it was unpredictable and working sometimes (which makes no sense :frowning: )

I’m still thinking it must be something I’ve done, I’m not the first person to use a tunnel on a load balancer for sure, but no one can point it out right now.

P.S I bought Pro to get support, but now I’ve fallen in love with all the new features, mainly the dynamic image resizing using the URL :drooling_face:

1 Like

Hi Jon - thanks for that information - I’m chasing internally to see if we can unravel this and we’ll get back to you on your ticket.

Glad you’re enjoying image resizing - it’s a really neat feature! Also I would recommend enabling Polish with WebP to get automatic reduction of images sizes, too. If you visit our “Core Setup” walkthrough it can show you some other neat best practises for your plan level:

Hi Jon

I am an having the exact same issue as your are / were. Did you manage to resolve the issue and If so, how?

I really appreciate your time. I am going around in circles on this and can confirm that I too tried everything you did.

Kind regards

Hi Simon,

I have exactly the same issue as Jon. Was this resolved? If so can you let me know how? See my ticket I logged a few days ago (2238723)

Many thanks

Hi Ryan,

If your configuration is truly the same as mine, the only way to work around it is to have a single tunnel serving a single origin and ingress, therefore you can use the catch-all rule and route it to your service, ignoring the ‘host’ field.

My understanding is it is a bug in the load balancer / Argo Tunnel.

Edit: So you would require multiple instances of cloudflared running, with multiple configurations.

Hi Jon,

Thank you for the speedy reply. What a relief and pity that it appears to be a bug. I burned a ton of time on this thinking I was going crazy.

I am not sure I am clear on your solution though. I do have a tunnel per origin setup. Would you mind posting an example of your ingress that would show the catch all?

Really appreciate the help.

P.S
I managed to get multiple cloudflared instances running by using docker.

I think I have solved it - checking now. Literally only left the service line in the ingress pointing straight to the origin.

Thanks for the help!

That sounds like it - I don’t think it is the end of the world, the process is quite lightweight, if it works shrug

Agreed! :grinning: