Cloudflared tunnel query origin server availability

TLDR; how do we query origin server status via API

Recently we had a number of origin servers show up as unreachable (error 500) and logs in cloudflared just showing very basic information.
Nothing had changed network wise, config wise etc. all we did to fix it was to restart cloudflared and it started working again - it was a couple of random origin servers, most remained just fine. No idea what went wrong - except something with the cloudflared tunnel and we have no further way to troubleshoot w/o some better logs.

Now how do we query the origin servers availability status via the API? I could not find anything, so we’re sitting ducks waiting for the next time this happens if we’re unable to gather the status via the API.

Any ideas?
/Rune

cloudflared exposes a handful of endpoints - I don’t imagine there’s any API as in the dashboard endpoints for you to check beyond setting up health checks on the public hostnames.

2022-05-05T10:54:11Z INF Starting metrics server on 127.0.0.1:43483/metrics

/metrics has Prometheus formatted metrics
/healthcheck reports OK
/ready reports a status code (i.e 200 for OK) and the amount of connections to Cloudflare’s edge

You can also rely on Cloudflare’s Load Balancer monitoring capabilities to have it actively check your Tunnel’s reachability from Cloudflare Global Network: https://developers.cloudflare.com/load-balancing/understand-basics/monitors/

1 Like

I just checked those urls, none provide any information as to the state of the origin server(s).

Any other suggestions?

our tunnel was reachable, no worries - it’s just that we got error 500 and logs showed origin server was not reachable. Of course they were and a simple restart of the CF instances “fixed” the problem… it was random which origins were OK and which were not. Not a great experience.

There’s two metrics that relate to errors reaching the origin and also counters of response codes returned by the origin.

responseByCode = prometheus.NewCounterVec(
	prometheus.CounterOpts{
		Namespace: connection.MetricsNamespace,
		Subsystem: connection.TunnelSubsystem,
		Name:      "response_by_code",
		Help:      "Count of responses by HTTP status code",
	},
	[]string{"status_code"},
)
requestErrors = prometheus.NewCounter(
	prometheus.CounterOpts{
		Namespace: connection.MetricsNamespace,
		Subsystem: connection.TunnelSubsystem,
		Name:      "request_errors",
		Help:      "Count of error proxying to origin",
	},
)
cloudflared_tunnel_request_errors 9

promhttp_metric_handler_requests_total{code="200"} 8
promhttp_metric_handler_requests_total{code="500"} 0
promhttp_metric_handler_requests_total{code="503"} 0

Outside of those, I assume the 500 you received was when reaching the tunnels public hostname - in which case, health checks or load balancer monitoring would work fine for alerting on that.

Ok, so I got this:

cloudflared_tunnel_request_errors 8

HELP cloudflared_tunnel_response_by_code Count of responses by HTTP status code

TYPE cloudflared_tunnel_response_by_code counter

cloudflared_tunnel_response_by_code{status_code=“200”} 34
cloudflared_tunnel_response_by_code{status_code=“404”} 7

Not sure how I can use this to identify if there is a current issue or this is historical data since the start of the tunnel or how I can identify which origin server(s) it’s all about… think of haproxy and how you can check on backend servers, that’s sorta the information I’m looking for.

Thx.