I reported this possible bug a few times since APO’s launch. Cloudflare doesn’t recognize /amp/ pages as “text/html” in this specific situation, and serves it from origin, bypassing APO.
This is happening since day 1:
- APO: Custom Page Rules, Origin Cache Control and stale-if-error
- APO bypassing cache for googlebot/crawlers?
- All of those pages do have a “text/html” header (official amp plugin)
- All returning status 200.
- All cookies removed via nginx;
- In my browser, everything is delivered from cache as expected.
- I’ve also ran several tests using curl, forcing “Pragma: no-cache” and “Cache-control: no-cache” headers. It always returns a cache “HIT”.
- The traffic is all mobile and from the US, even though it’s a brazillian site (portuguese)
- That’s not just regular traffic. I suspect it could be Googlebot (or some other bots). But that doens’t explain missing the cache more than 5k times for the same urls.
- These posts are not our most visited content. It’s just random URLs, some don’t even get real traffic.
- After enabling APO in october, the average response time in our Google Search Console report increased from 80ms to +230ms. It could be an indicator that APO is missing the cache for googlebot.
Still, that doens’t explain missing the cache 5 thousand times per URL. It should be cached after the first request, and revalidated in the subsequent requests (after expired).
11.8 million requests went straight to my server over the last week – 1.6 million over the last 24 hours. That’s a lot more traffic than these pages actually have.
Here are some URLs from the list, if anyone wants to check:
- Como rastrear um celular Android [perdido ou roubado] | Aplicativos
- Como mSpy é usado para espionar celulares [livre-se dele] | Celular | Tecnoblog
- Os 10 melhores filmes de ficção científica da Netflix segundo a crítica | Cultura | Tecnoblog
Any ideas on why this is happening?