Lets suppose there are 100 initial concurrent requests to the CDN when the cache is still empty.
Does Cloudflare buffer all the requests from its side and send only a single request to the origin server to first fetch and cache the content and then respond to all the 100 requests together? Or does it forward multiple of those requests simultaneously to the origin until the cache gets populated.
This behaviour is called request collapsing or request coalescing. I don’t have a definitive answer for Cloudflare, but your question has (I think) answered a question I always had about the default cache behaviour, and I have a rough idea on how the cache will behave.
The initial problem is that if Cloudflare gets 100 concurrent requests, how do they know at the point the requests are received whether or not the response from your web server will mean that the cache will be populated? The next problem is how you react if you decided to collapse requests, and that decision is determined to be incorrect.
I’ve wondered why the default Cache rules in Cloudflare were based on file extension, and not on the content-type response header. I think the answer is that you cannot collapse requests unless you know the response will be cacheable, so relying on the file extension to determine if a response will actually be cacheable makes more sense when trying to collapse requests.
So, my best guess to your question is: based on your zone configuration (including Cache Level, Cache On Cookie, Cache Everything etc.), if the asset is expected to be cacheable then requests will be collapsed, otherwise they will not. If the collapsed request gets an uncacheable response the pending requests are released serially to the origin. An object is probably created in the cache for the first request to indicate that an origin request is in flight, so that subsequent requests know to wait on that response. (Again, this is all just a guess.)
@kmklapak might be able to give an authoritative answer.
1. Known cacheable content
Each colo coalesces the requests and then streams the content from the origin and distributes the data concurrently to all waiting requests that end up at that colo while simultaneously updating its cache (concurrent streaming for colos without global lock)
2. Known non-cacheable content
All request forwarded to the origin directly
3. Cacheability not known before hand
Coalesce requests for the colo and wait to get the first response/HEAD from origin to detect cacheability. Then,
If content found cacheable
Start streaming to all coalesced requests for colo while also filling up the cache (as Point 1 above)
If content not found cacheable
Forward the coalesced/waiting requests to the origin (as Point 2 above)