APO: Custom Page Rules, Origin Cache Control and stale-if-error

I agree. I did that saturday. Sent a very detailed message, but they gave me a REALLY generic answer, pointing me to the blog post. I responded asking for a more personalised feedback, and still got no answer.

I’ve tested that, too. At first i created a redirect. Then, a second layer cache for the most used query strings. That didn’t solve, either…

Interesting so where is the additional cpu load coming from ? nginx, MySQL and/or PHP-FPM. I assume PHP-FPM ? Cause if it’s coming from nginx, then it is probably nginx level caching working as it should just under load?

What i meant is that the high load is not because of the query strings being bypassed. If it was, a redirect or the local cache should resolve the issue.

PS: i’ll add more strings to the redirect and see what happens

I’d politely give the ticket a nudge again then :slight_smile:

What I mean is when under high cpu load which processes are showing as using that cpu ?

if you have sysstat package and pidstats command available you can work out nginx cpu load averages via command

pidstat -ulh -C nginx 1 10| egrep -v 'UID|Linux' | awk -v cpumax=$cpu_pc '{cpu += $7} END {print cpu"%",cpu/cpumax}'
510.92% 0.63865

all cpu cumulative load for nginx over 8 cpu thread server is 510.92% which averages over 8 cpu threads = 0.63865 cpu load average of 63.865% cpu utilisation

and php-fpm cpu load average

pidstat -ulh -C php-fpm 1 10| egrep -v 'UID|Linux' | awk -v cpumax=$cpu_pc '{cpu += $7} END {print cpu"%",cpu/cpumax}'
32.98% 0.041225

run commands when under load

I’d expect it be PHP-FPM under stress/load if it’s cache bypass at all levels. If it’s nginx under load then it could be nginx level caching just doing it’s job.

Oh, it’s definitely php-fpm!

My DB is in a separated server (DO managed). We also have a Redis layer for DB transients, it’s running fine. The spike is in php-fpm process, it sprouts like wet gremilins when i disable the Page Rule. :stuck_out_tongue:

1 Like

A post was split to a new topic: Wordpress APO issues

So, i ran a few more tests, and still didn’t find a solution.

Query strings

First of all, i’ve increased the number of strings to redirect:

Screen Shot 2020-10-10 at 01.00.29

It helps a little to stabilise the server, but didn’t solve the problem at all.

fastcgi_hide_header Set-Cookie;

Then i tried to put the rules to hide the headers at the beginning of the server directive. Maybe it was being bypassed, i don’t know:

Screen Shot 2020-10-10 at 01.02.53

But nop, no effects.

Let the server burn

Then i decided to clear ALL caches, and let the server burn fo almost an hour, at 3am. Maybe the cache needed to be built, right? Nop, no results.

But i was able to notice one thing, using goaccess.

These are requests that were hitting my server. Cache purged, Page Rule disabled, and APO enabled (with WP Plugin).

Take a look at the red number. Look how many requests to the same urls. Even if there was a single person trying to access the same page, at the same time, all around the world, still it should not hit my server 124 times (and growing rapidly).

I mean, the first hit on each CF’s datacenter would generate a cache, and all the subsequents requests should be cached, right?

So i opened all of those URLs in incognito mode, and got a cache hit. I thought: ok, now i know those urls are cached in Brazil’s datacenter fore sure. The hits should stop, or at least slow down.

NOP, it didn’t. It kept growing and growing after i took this screenshot.

Also, there really wasn’t many users online in that google translator link. And the logs doesn’t show any unusual user-agents:

GET /283220/como-funciona-o-google-tradutor/ HTTP/1.1" 200 22276 “-” “Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.75 Safari/537.36”
GET /283220/como-funciona-o-google-tradutor/ HTTP/1.1" 200 22276 “-” “Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.132 Safari/537.36”
“GET /283220/como-funciona-o-google-tradutor/ HTTP/1.1” 200 22276 “-” “Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.121 Safari/537.36 Edg/85.0.564.70”
“GET /283220/como-funciona-o-google-tradutor/ HTTP/1.1” 200 22276 “-” “Mozilla/5.0 (Linux; Android 9; SM-J600GT) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.127 Mobile Safari/537.36”

So, how is it possible for this requests to keep hitting my server, when those pages are already cached? Makes no sense, even if cloudflare had 200 diferent datacenters, what are the odds?

Last but not least: HEAD + 304 == very important

I don’t think APO is responding to HEAD requests, nor sending status 304 based on the last-modified. But it should. Take a look at this graph from GSC:

Look how much additional data was transferred, since APO was enabled.

The before graph was my OLD setup, using a Page Rule with only one hour of Edge Cache TTL. Much more efficient, even with a shorter cache.

So that’s it, i’m about to give up on APO. In theory is supposed to be the perfect cache setup. But it really needs to mature a lot, yet.

Ah Google translate ! That is why. From what folks have reported to me, Google translate requests are done with a query string so end up bypassing a lot of caching done on CF CDN cache or origin cache configurations. I don’t operate a non English site so can’t say I have much experience with Google translate.

But I did try Google translate on my Wordpress blog and the request looks like this

grep about cfssl-access.log | grep translate - - [10/Oct/2020:11:53:29 +0000] GET /about/ HTTP/1.1 "200" 35540 "https://translate.googleusercontent.com/translate_p?sl=auto&tl=zh-TW&u=https://servermanager.guide/about/&depth=1&rurl=translate.google.com.au&nv=1&sp=nmt4&pto=aue&usg=ALkJrhgAAAAAX4Gh5HjNi6xSJZkA8ibZL5R018Hlm2zS" "Mozilla/5.0 (Linux; Android 10; Redmi Note 7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.75 Mobile Safari/537.36,gzip(gfe)" "*,*,*" "5.46" "1455" "1" "0.200" 5e002426d2b2ef49-NRT 05b3f4ec430000ef49c3b05000000001 TLSv1.3 TLS_AES_256_GCM_SHA384 "*/*" "-" "-"

Google translate request doesn’t use text/html Accept header as I see in the log shows = "*/*"

@yevgen it does seem like alot of requests will be bypassed by CF Wordpress APO due to the requirement of only caching text/html Accept + GET requests

APO doesn’t cache on HEAD only GET requests - I actually covered this in my write up at https://community.centminmod.com/threads/cloudflare-wordpress-plugin-automatic-platform-optimization.20486/ and was initially seen when I used KeyCDN performance tool to check caching - the tool uses a HEAD request not GET so kept reporting APO as not cached

1 Like

I meant that there wasn’t so many users online in that specific post about google translator. Not that i am receiving traffic from translated URLs.

This is the first url in that screenshot. But the behaviour is the same: the URLs are receiving lots of hits, even after the cache was already built. And it isn’t because of different datacenters, since the numbers are too high, and they kept growing fast.

I don’t understand why. All the headers are fine and cookies were removed.

And i don’t think it’s bypassing because of some string. If that was the case, the strings would show up in the logs, right? But there’s no strings, only the permalink.

Yes! Smaller sites may not notice the difference, but the feature is unusable for bigger ones.

Yes, so all HEAD requests are being bypassed to my server. Also, GET requests don’t respond with status 304. It forces crawlers to download the entire page, only to find out it wasn’t need, for pages that didn’t change.

What’s the reason to bypass HEAD and 304 responses?

Ok, i was looking at some graphs in CF analytics, and found out a few more stuff.

Here is what happens when i disable APO, and keep the Page Rule On

When APO is enabled, analytics shows that almost all HTML requests are being served by Cloudflare. In theory, this is how APO is supposed to behave, but that data simply doesn’t match the reality.

It also shows that my server is receiving 10x more hits, which simply isn’t true (server load was around 1.3 at this point).

When i enable APO again, and disable Page Rule

Again, the graph doesn’t match the reality. It shows that the requests to the origin are dropping quickly, but the reality is the exact opposite:

Uncached hits (supposedly) dropped 90%

After enabling APO when it was launched, stats shows that uncached hits to my server dropped by almost 90%. I think it’s proven by now that this is not true (at least not when the Page Rule is disabled).

Final thoughts

It seems like:

  1. APO is bugged in my account. Looks like it is revalidating every single request in the origin server, causing this high load.
  2. CF analytics is logging the expected behaviour, but not the actual reality.

Plus: any explanation for this huge increase in requests?

Just out of curiosity (and maybe not related to the cache issue): CF analytics shows that the requests increased 3x after enabling APO. This isn’t a REAL increase: nothing noticed in my server, or my Google Analytics. Is this info correct?

I notice you have a lot of 301 status requests - what are they from ?

how’s cache analytics tab look? you can break down cache analytics by HTTP method too i.e. GET vs HEAD etc

Looks very similar: when Page Rule is disabled, requests are being served by origin.

Just old paths for permalinks, images and the string removal code. All are handled in Nginx, so it doesn’t affect the server load.

I see. As even for my own custom CF Worker caching, 301 redirects aren’t cached

Might need to contact CF via support ticket to check.

FYI, on cache and web analytics charts, you can use your mouse to highlight and drill down into a specifiic time period which will filter further down to a list of urls/paths that they’re related too

Yes, i don’t think is needed to cache 301 redirects. Nginx can handle it pretty well.

You’re right. I just did it again. Hope they can escalate this issue internally.

Here’s another screenshot, filtering only HTML + GET + 200.

Oh nice! Thanks for the tip.

I did that, and it’s just some regular posts:

But this info catched my attention:

Filtering by “served by origin”, 95% of the requests were served as dynamic. Less than 5% was miss, expired or revalidated.

Only other thing I can think of is that your origin’s cache control headers aren’t telling CF they cacheable and your page rule override that so masked the issue. When you disable the page rule, you leaving the misconfigured origin server’s cache control headers to control how CF APO does caching? Might want to inspect your origin server’s served cache control headers with CF cache bypass and see what CF APO is seeing.

I did this test already, couldn’t find anything wrong with the headers. Plus, i have “Respect existing headers” enabled.

Zooming into the graph, i got this top URLs (page rule off + APO On):

Since there’s a few hits already, the chances are it’s all cached, right? I did open each one in Chrome. First, the cf-status is dynamic. Then i get a HIT when i refresh.

It’s the same behaviour from the previous tests: when i open the links in inconito, i can see cache hits. But requests keep hitting my server, as described here:

Hello @yevgen,

I checked the headers using this tool: https://tools.keycdn.com/curl

It is also showing the same cf-cache-status: DYNAMIC

Check the screenshot:

Also, as per your suggestion, I checked using Chrome Developer Tools (unchecked Disable cache option), for some pages it’s showing HIT but for many pages, it shows DYNAMIC.