Odd issue here with the CF Load Balancer and Argo Tunnels.
(Servers all serving HTTPS interface)
I’ve got two servers, api-server-1, api-server-2, I can load balance them fine, lets call the load balancer LB1.
Direct access via their ‘A record’ works fine.
Access via the load balancer, allocated 50/50 works fine.
They have CF certs on them for HTTPS.
Now I’m experimenting using Argo Tunnels / Cloudflared to route some docker containers. The plan is to purchase more load balancer origins and use Argo Tunnels to cope with extra demand.
The two docker containers are on the same physical server, https, on different ports.
Both have valid CF SSL certs.
Both containers have a CNAME record (created via cloudflared), lets call them argtun1, argtun2.
I can access these containers via their CNAME records completely fine.
I can script a curl request to the CNAME record in a tight loop and they always respond fine, all over https, no errors at all.
But when I try and put one of these servers into the LB pool is where things start to break down.
If I do api-server-1 and argtun1 in the pool, I start to get strange errors which I cannot track down but it also works sometimes.
If I refresh https://LB1 I can see it switching servers 50/50. I can also see at the origins the LB health check requests coming in with a 200 response.
Every so often, I start to get a 403, cf-ray id 67a0cf94cd972c5a-LHR , 67a0d222ca79f3df-LHR , or even weirder 404, 67a0d285ae20f3df-LHR .
Now, if I disable api-server-1 in the pool leaving just argtun1 in the pool, it stops working completely, generally with 404, 67a0d4fe7e61e600-LHR , and I assume it doesn’t help for some unknown reason it has switched to caching the error response which is now cache HIT for the get request to LB1 , even though the max-age is always 0 for URI / .
Re-enable api-server-1 , eventually I start getting the 403 errors again, 67a0d9bd6e0053fe-LHR and some more back and forth with 404.
BUT if I go into the LB1 and turn argtun1 off, save, turn it on, save . Wait for the first health check to succeed. It mostly goes back to the working state, throwing in some 67a0dd313a6fbc06-LHR errors here and there.
My LB1 health checks are to URI /, this always returns 200 (I can see the origin logs).
Accessing / via LB1 is returning errors, surely then the origin which served it should be taken out of the pool if this was truly the case?
I have no firewall events matching the ray ids.
What is going on? It makes no sense?