Known bots bypass cache, pull from origin?

My cache settings with cloudflare are working as expected. I’m just curious as to why my origin server logs show all of the bot traffic, it’s not being cached.

I’m guessing that’s the default behavior for known-bots? I’m sure there’s a good reason but I can’t find any articles about known-bots bypassing cache. I see lots of Googlebot, Bing, etc. requests in origin logs but don’t have a rule to exclude them from cache.

I’d like to know which bots are able to access origin by default

If you mean which are allowed to bypass cache by default, the answer is “none.”

Googlebot, Bing etc are all reaching my origin server. If they are not auto-allowed to bypass cache by cloudflare, and I have no rules to give them a pass set, are they reaching my domain by IP directly? Bypassing cloudflare entirely? Sending special headers?

The reason I am getting known bot logs in origin is because they obeyed my origin headers, oops. While I have versioned files and static assets like images set to max-cache a long time into the future, html (ie: most urls) had a very short cache time set on origin.

I set the cloudflare cache time well into the future but it respects shorter cache time rules on origin, so google, bing et all were checking for new versions daily, and not getting a cached version or 304 response.

Fixed on origin = immediately fixed on CF, no need to clear the existing cache as CF updated the cache time instantly. Nice.

