Yes that’s what I thought as well. However, I made two subjective observations, and I wanted to get some more information on this.
When I reboot a server in my load balancing setup and send a request, it can happen, that a request to the loadbalancer is running pretty long, even though my server might handle it in 300-400ms, but it runs, let’s say 15s. The server that is rebooting hasn’t been removed from the pool, so the request would have been send to the server in question, wasn’t answered and then moved over to a healthy server - that’s my theory.
My servers handle requests that require quite a bit to answer, it can be > 20-30s. When I hit a traffic spike and I scale up and add more servers, I have the feeling that the load is not reduced as much as it should.
My theory is, that CF will send out a second request to a different server after a certain threshold/timeout, because it seems that the server is not responsive (but it’s actually working, but it takes some time to answer). Before setting up an experiment, I figured it would be easier to ask first. I haven’t found anything specific about this in the docs.
The docs contain some (probably marketing text), which made me think CF might be doing something like this:
Load balancing and failover : Deliver traffic evenly across healthy servers, automatically failing over when a server is unhealthy or not responsive.