APO: Custom Page Rules, Origin Cache Control and stale-if-error

Hi there,

First of all, i’m very excited about the new APO feature! Just activated in all of my sites.

Second: does anyone have more details on the cache management?

I have specific cache times configured for some URLs. Some are defined via Page Rules and others via Origin Cache Control.

For example, i have some wp-json URLs with custom expiration time (s-maxage), using Origin Cache Control. It vary between 10 minutes to a month TTL.

Another example, a page rule sets the edge cache TTL for old posts to 7 days, and 2 hours for new posts.

Also, all of this pages have a very long stale-if-error header, in case our server crashes or something.

To sum up:

  1. Do i still need those page rules with custom expiration for edge-cache?
  2. Does Origin Cache Control overrides APO?
  3. Is the stale-if-error directive still needed with APO, or it has this feature built in?
2 Likes

Thanks for trying out APO. You have asked great questions:

  1. Do i still need those page rules with custom expiration for edge-cache?
    It depends, e.g. for wp-json URLs you should keep them as APO primarily deals with html content so json is bypassed for any caching/transformation.

Another example, a page rule sets the edge cache TTL for old posts to 7 days, and 2 hours for new posts.
You probably don’t need those rules. If you have WP plugin installed, APO will automatically cache content for 30 days and invalidates on change within 30 seconds.

  1. Does Origin Cache Control overrides APO?
    APO ignores Origin Cache Control for caching on the Edge, it serves original Origin Cache Control to the client though. During our testing we found out that many WordPress installations have misconfigured Cache Control headers so we decided not to rely on them.

  2. Is the stale-if-error directive still needed with APO, or it has this feature built in?
    API has this feature built in.

2 Likes

Thanks for the reply!

  1. About custom Page Rules for cache

Here is what happens when i disable the “Cache Everything” Rule:

Any ideas why?

If i access my site in incognito, the page headers shows that the pages are cached. Which leads me to another question: does APO delivers cached pages for crawlers, too? I’m suspecting that maybe this is the problem here.

  1. No cache for Strings

Another reason for the high load could be the fact that it doesn’t cache HTML in pages with strings. All the social traffic and Google Discover/News uses strings.

Do you plan on releasing a way for us to override this rule?

  1. Expiring when there’s small changes on the theme

You mentioned the 30 days TTL for the cache. What happens if i make some minor tweaks in my theme files (via ftp)? How long does it take for those changes to go live?

Notice that i don’t mean a full theme switch, but rather just some minor changes in the HTML/CSS/JSS files.

Do i need to use the “purge all cache” button for every small tweak?

1 Like

Hello @yevgen

I have setup APO, in the Cloudflare dashboard, it’s showing - WordPress plugin successfully detected on insidebedroom.com. Also, in WordPress plugin settings, it’s turned ON.

But when I checked the posts and homepage cache status it’s showing - CF-Cache-Status: DYNAMIC

How to fix it? Please help.

Thanks,
Richard

Hi @richardmorse441 by default Chrome sends Cache-Control: no-cache, when DevTools are open.
You can uncheck “Disable cache (while DevTools is open)” setting and see that cf-cache-status: HIT will be returned:
Screen Shot 2020-10-03 at 8.29.58 PM

2 Likes

Thanks a lot, got it now.

Any ideas about my issue?

I’m not passing any of those headers, and still, if i disable the Cache Everything Page Rule, the load increases from 0.6 to 20.

In my tests, it seems like APO bypasses crawlers. And my site is very big, so there’s a huge indexing activity on it.

I’ve made tests with curl in the terminal, and it always shows the header “bypass”:

But accessing the page in incognito mode, returns a cache hit. So it really seems like it’s just a user agent bypass, or something similar.

More tests, from GTMetrix, KeyCDN and WebPageTest. All beeing bypassed by APO (the cf-cache-status: HIT is because “Cache Everything” Page Rule is still enabled).


I even have this setup in my nginx configuration:

Screen Shot 2020-10-08 at 13.36.25

For KeyCDN performance test at least it’s due to it using HTTP HEAD request instead of HTTP GET request for testing. CF Wordpress APO will only cache for HTTP GET requests for text/html mime type requests. See my discussion on this here and actual curl test here

Webpagetest and GTMetrix should show Wordpress APO cache working though from my tests at

Note Cloudflare CDN cache is per datacenter, so it could be crawler is coming from a geographic region where CF datacenter CDN cache hasn’t been populated yet

it could be crawler is coming from a geographic region where CF datacenter CDN cache hasn’t been populated yet

I don’t think that’s the case. APO is active in my account since the launch. And still, if i disable the Cache Everything Page Rule my server load increases to 20.

I don’t understand what could be bypassing the cache, since the headers are all fine. If it was a cookie, the header would show speedwp/origin,cookie. And none of the other headers that could invalidate the cache are present.

  • User in Incognito: HIT
  • User Logged in: cookie/dynamic
  • Others (crawlers, cron, testers, etc): Bypass

My DNS is using A and i’m also using Cloudflare`s WP Plugin.

thanks for bringing it to our attention, we are going to investigate.

2 Likes

Only thing I can think of it APO CF worker cache doesn’t cache if request doesn’t have Accept text/html request header as I confirmed at https://community.centminmod.com/threads/cloudflare-wordpress-plugin-automatic-platform-optimization.20486/page-2#post-86571. So wonder if Google crawler does sent Accept text/html? I just added to my nginx origin logs the $http_accept logging so will see later if Google bot does do Accept text/html

Still, that doesn’t explain why your domain is sending cached versions to GTMetrix and WebPageTest, and mine isn’t… It’s the same request for both.

I did WPT test on your Wordpress site and it looks like cache hit fine - I always test WPT with preserve original browser user agent string instead of WPT UA

your WPT request header for HTML index

:path: /
accept-language: en-US,en;q=0.9
accept-encoding: gzip, deflate, br
sec-fetch-site: cross-site
accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9
user-agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.75 Safari/537.36
:scheme: https
upgrade-insecure-requests: 1
sec-fetch-mode: navigate
:method: GET
:authority: t***.net
sec-fetch-dest: document

your WPT response header for HTML index

status: 200
set-cookie: __cfduid=d5aa897698e815329b6e6f9d5f4f360251602181140; expires=Sat, 07-Nov-20 18:19:00 GMT; path=/; domain=.t***.net; HttpOnly; SameSite=Lax; Secure
cf-cache-status: HIT
vary: Accept-Encoding
cf-h2-pushed: </wp-includes/js/wp-emoji-release.min.js?ver=5.5.1>,</wp-content/cache/fvm/1602112008/out/header-3a7a3037d5d4def6cb834c93d54a34b506a6b0be.min.css>,</wp-content/cache/fvm/1602112008/out/header-4f4706d5f25571762ab9e59e03f16657e8f963ab.min.js>,</wp-content/cache/fvm/1602112008/out/footer-2f04ad7ee65312afff910b3e7bf6c3a5d93dba57.min.js>
last-modified: Thu, 08 Oct 2020 18:16:20 GMT
link: <https://t***.net/wp-json/>; rel="https://api.w.org/", </wp-includes/js/wp-emoji-release.min.js?ver=5.5.1>; rel=preload; as=script, </wp-content/cache/fvm/1602112008/out/header-3a7a3037d5d4def6cb834c93d54a34b506a6b0be.min.css>; rel=preload; as=style, </wp-content/cache/fvm/1602112008/out/header-4f4706d5f25571762ab9e59e03f16657e8f963ab.min.js>; rel=preload; as=script, </wp-content/cache/fvm/1602112008/out/footer-2f04ad7ee65312afff910b3e7bf6c3a5d93dba57.min.js>; rel=preload; as=script
date: Thu, 08 Oct 2020 18:19:00 GMT
cf-ray: 5df1de217a592a03-IAD
cf-edge-cache: cache,platform=wordpress
expect-ct: max-age=604800, report-uri="https://report-uri.cloudflare.com/cdn-cgi/beacon/expect-ct"
x-cache-status: BYPASS
x-content-type-options: nosniff
content-security-policy: upgrade-insecure-requests
content-encoding: br
age: 4
strict-transport-security: max-age=31536000; includeSubDomains; preload
cf-request-id: 05ab0928ea00002a03a712d200000001
server: cloudflare
alt-svc: h3-27=":443"; ma=86400, h3-28=":443"; ma=86400, h3-29=":443"; ma=86400
content-type: text/html; charset=UTF-8
:status: 200

you have 1x cf-cache-status and 1x x-cache-status headers one HIT and one BYPASS - seems like you have a conflicting layer of caching somewhere in Wordpress - if you have other Wordpress cache plugins, might need to disable it and try. So your Wordpress server side cache is setting a bypass and may insert a header which causes CF CDN cache to BYPASS on first response but getting cached by page rule for subsequent responses. Basically, the page rule cache everything caching the bypass CF APO cache response.

try with WPT with preserve original browser user agent string option

but it could be your page rule too?

1 Like

Hi @eva2000, thanks for the response,

Sorry, i believe i made a little confusion with my screenshots. x-cache-status is the cache status for nginx, which i use for my apis. I’ve added this to the header during the tests, for debugging, and ended up getting confused.

It was a mistake, that has nothing to do with APO. Looking now to this images, it seems like cf-edge-cache is indeed returning the correct headers (cached) for the speed tests.

But my problem persists: when i disable the Page Rule, the Page Load increases to 20, and my Chartbeat Server Load Time increases from 300ms to 4.000ms. This second information shows that the end users are beeing affected by the slowness.

So your Wordpress server side cache is setting a bypass and may insert a header which causes CF CDN cache to BYPASS

As for the wrong flag getting cached (and thus invalidating APO) i did some tests before suspecting that this could be the case:

  1. APO was enabled friday, 2am. I’ve purged all cache, and disabled the cache rule for almost 15 hours. I had to reenable the page rule saturday afternoon, because the server was crashing.
  2. I’ve tested different parts of the site. For example, /meiobit/ is a separate WP installation, with the official plugin. On Tuesday i disabled the page rule for this path and purged all the cache. And it behaves as i described: cache hit for incognito / dynamic for cookies/logged in. Again, i had to reenable the page rule, due to high load.

I’m thinking that maybe it could be because it doesn’t cache HEAD requests? Googlebot uses HEAD requests to check for the last-modified rule, and we have A LOT of googlebot activity in our site.

But i can’t see many HEAD requests in our log, so i don’t know if this is the case. Anyway, why not caching HEAD requests?

Ps: i just disabled the Page Rule during 20 minutes and also purged all the cache. This is what happened:

burning

I then reenabled the Page Rule, and after only 5 minutes:

Screen Shot 2020-10-08 at 19.44.33

Googlebot is not accessing the same URL over and over again, and it’s impossible to rebuilt the entire site’s cache in five minutes. So it looks like the cache is beeing bypassed. I just can’t find out in which cases…

My logs show googlebot doing GET requests and with text/html requests

66.249.73.98 - - [08/Oct/2020:18:49:09 +0000] GET /category/wordpress/ HTTP/1.1 "200" 35596 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" "66.249.73.98" "5.55" "42" "1" "0.522" 5df20a4434a5d26a-DFW 05ab24bea00000d26ab53cb000000001 TLSv1.3 TLS_AES_256_GCM_SHA384 "text/html,application/xhtml+xml,application/signed-exchange;v=b3,application/xml;q=0.9,*/*;q=0.8"
curl -s https://ipinfo.io/66.249.73.98
{
  "ip": "66.249.73.98",
  "hostname": "crawl-66-249-73-98.googlebot.com",
  "city": "Atlanta",
  "region": "Georgia",
  "country": "US",
  "loc": "33.7490,-84.3880",
  "org": "AS15169 Google LLC",
  "postal": "30302",
  "timezone": "America/New_York",
  "readme": "https://ipinfo.io/missingauth"
}

Most likely explanation is Googlebot is the first uncached visit for the CF datacenter serving the crawler’s request so more likely a cache miss.

So what you probably need to do to alleviate the cache miss origin load, is to setup a 2 tier full HTML page cache setup - 1st level is at CF edge server CDN cache via APO and 2nd level is on origin Nginx server which you can do via various full HTML Wordpress page caching plugins and native Nginx PHP-FPM fastcgi_cache full HTML page cache.

I usually set up Wordpress installs with origin Nginx level full HTML page caching in combination with CF CDN caching of full HTML pages. i.e. I wrote up how I do it via Cache Enabler full HTML page caching at Nginx cache level bypassing PHP-FPM for my Centmin Mod LEMP at https://servermanager.guide/203/wordpress-cache-enabler-advanced-full-page-caching-guide/

Unfortunately, still may not help due to Googlebot being the first uncached visit - so then optimising backend origin Nginx/PHP-FPM to handle the load is needed.

This was my previous configuration: a long cache in nginx, and a shorter (1 hour) in Cloudflare. Then i changed some settings, and removed the nginx layer. The 2-layer setup increases too much the complexity of the setup, and it’s unnecessary.

Until APO was launched, i was only using the “Cache Everything” rule, with two different durations: a shorter for new posts (2 hours), and a longer for old posts (3 days).

This setup is holding the server up right now. But again, it doesn’t make sense: something is being bypassed in APO, and the Page Rule is acting as a second second layer in this case.

But there shouldn’t be a second layer at all with a cache design like APO’s. It’s goal is to serve always static HTML, with asynchronous rebuild and 30 days TTL. It acts like a stale-while-revalidate, which is great for huge sites like mine, and should deliver fast responses for Googlebot.

I should be able to remove the Page Rule…

I agree you should be able to. Probably need to submit a CF support ticket to get some to look at it more closely.

Oh technically, WP APO doesn’t cache query strings - see my tests at https://community.centminmod.com/threads/cloudflare-automatic-platform-optimization-for-wordpress-cache-effectiveness.20494/ so you would still need that 2nd level caching layer on origin side as all query string based requests will bypass CF APO cache and hit your origin

I agree. I did that saturday. Sent a very detailed message, but they gave me a REALLY generic answer, pointing me to the blog post. I responded asking for a more personalised feedback, and still got no answer.

I’ve tested that, too. At first i created a redirect. Then, a second layer cache for the most used query strings. That didn’t solve, either…