TLDR; how do we query origin server status via API
Recently we had a number of origin servers show up as unreachable (error 500) and logs in cloudflared just showing very basic information.
Nothing had changed network wise, config wise etc. all we did to fix it was to restart cloudflared and it started working again - it was a couple of random origin servers, most remained just fine. No idea what went wrong - except something with the cloudflared tunnel and we have no further way to troubleshoot w/o some better logs.
Now how do we query the origin servers availability status via the API? I could not find anything, so we’re sitting ducks waiting for the next time this happens if we’re unable to gather the status via the API.
cloudflared exposes a handful of endpoints - I don’t imagine there’s any API as in the dashboard endpoints for you to check beyond setting up health checks on the public hostnames.
2022-05-05T10:54:11Z INF Starting metrics server on 127.0.0.1:43483/metrics
/metrics has Prometheus formatted metrics /healthcheck reports OK /ready reports a status code (i.e 200 for OK) and the amount of connections to Cloudflare’s edge
our tunnel was reachable, no worries - it’s just that we got error 500 and logs showed origin server was not reachable. Of course they were and a simple restart of the CF instances “fixed” the problem… it was random which origins were OK and which were not. Not a great experience.
Outside of those, I assume the 500 you received was when reaching the tunnels public hostname - in which case, health checks or load balancer monitoring would work fine for alerting on that.
Not sure how I can use this to identify if there is a current issue or this is historical data since the start of the tunnel or how I can identify which origin server(s) it’s all about… think of haproxy and how you can check on backend servers, that’s sorta the information I’m looking for.