We have a Cache Everything page rule to greatly reduce the load on our origin servers. Since we might serve slightly different content for the same page based on country, we have also enabled the Custom Cache Key setting and activated the “geo” user feature to shard the cache by country.
We want some pages to be completely hidden to users from the US, so we’ve implemented 302 redirects on our PHP server when the CF-IPCountry header matches
The problem is that Googlebot mainly uses US-based IP addresses, which means that it gets the 302 redirects. A few days into this, our pages were removed from the Google Search index. We don’t want that.
How could we bypass the cache just for Googlebot and serve it the original content, rather than the 302 redirect?
Since Googlebot uses its own user agent string, we could further shard the cache by the User-Agent header. Then, on our server, we would check for the presence of
Googlebot in that header and return the original content. My main concern with this approach is that the cache will get too fragmented and greatly increase the load on our server, since users have a wide range of browsers and devices with different user agents. This is confirmed in the docs, where it’s said that there’s a “high cardinality and risk sharding the cache”.
Is there a way to have a custom cache key just for bot and non-bot traffic? Or is there any other approach to this?