APO – Requests with fbclid query string hitting APO cache also hitting origin server

Hi,

I would say that at this point, APO is nearly perfect, but there is still one important issue that remains. I noticed that many requests with an fbclid query string hitting the APO cache (HIT) are also hitting my origin server.

I can easily reproduce the issue by entering any URL (Ce photographe capture des roux dans le monde entier depuis 7 ans et voici ses 15 plus belles photos - ipnoze for example) and changing the fbclid parameters. For every request with different parameters, an entry is made in my origin server’s access log, even if I get a HIT from APO.

I get a lot of traffic from Facebook with fbclid query strings. So when APO is enabled, traffic to my origin server jumps from 30 kbps to 300 kbps. CPU usage also jumps from 0.20% to 0.60%. When a Cache Everything Page Rule is used instead of APO, those numbers are reversed. The screenshot below illustrates this issue and shows what happens when I switch from APO to Cache Everything to APO again.

Imgur

So I guess there might be an issue in the way APO handles cachable query strings like fbclid. It seems like APO might be making a request to the origin server although it is returning cached data. I’d be curious to know what @yevgen thinks about this.

Thanks.

@yevgen can probably track this down if you post a URL or domain name.

1 Like

Thanks @sdayman. I updated my post with a URL.

It doesn’t seem like you have Cloudflare for WordPress plugin installed on your domain. If you install it the problem should go away. We are going to deprecate the mode of using APO without a plugin, so one more reason to switch.

The plugin is installed, I’ve been using it for years. APO is also enabled. I’ve been helping you guys debug APO for weeks now.

Imgur

I’ve been helping you guys debug APO for weeks now.

I know, I appreciate that, I assumed maybe you have multiple zones with and without a plugin installed.

You may have an issue with your setup though. We have added cf-edge-cache header to indicate the presence of the plugin and it’s not served on your zone. That results in APO doing extra calls to origin, I suspect it’s an issue with other caching plugins you have. Maybe you need to clear cache in them to see the cf-edge-cache header in the response headers.

1 Like

Ok. I understand where you’re coming from now.

I am using the Cache Enabler Wordpress plugin for page cache. Not sure if maybe this could be causing the issue. I will try to disable it and see if I can get that cf-edge-cache header.

Thanks!

yes, we need this header to instruct APO worker not to fetch origin unnecessarily: Cloudflare-WordPress/Hooks.php at master · cloudflare/Cloudflare-WordPress · GitHub

2 Likes

Perfect.

So Cache Enabler seemed to be the culprit. After disabling it, I now see the cf-edge-cache header.

I will keep and eye on my server logs and see if it helps. I will also open a ticket with Cache Enabler to see why they are removing the cf-edge-cache header.

2 Likes

Hey @yevgen, I think I might know what’s going on.

If I understand correctly, the Cloudflare plugin is adding the cf-edge-cache header on each page load when a user is NOT logged in. But when a page cache plugin is installed on the web server, pages are not returned by PHP anymore and are directly returned by the web server (NGINX in my case) as static HTML. So the header from the Cloudflare plugin isn’t added anymore.

I guess this means that any type of server page cache will break APO functionnality. But a server page cache is really important so that when Cloudflare fetches a page from another location, it doesn’t have to hit PHP and hits the page cache instead. That’s basically why I have two caches, one on the server and the other on Cloudflare.

it’s not necessarily breaks APO but make it less intelligent, with longer times to update.

And also making tons of unnecessary trips to the origin, as you explained earlier. This is the actual issue I’m trying to fix, but it seems I’ve hit a dead end?

I guess I could add the cf-edge-cache header in my NGINX configuration just like in the Cloudflare plugin’s code (Hooks.php) when a user is NOT logged in. Do you think that would work?

Yes it will work, for logged user it also should serve:

cf-edge-cache: no-cache

Great! I just got it working by adding the following line in my NGINX configuration for Cache Enabler:

location / {
    try_files $cache_enabler_uri $uri $uri/ $custom_subdir/index.php$is_args$args;
    add_header "cf-edge-cache" "cache,platform=wordpress" always;
}

I am thinking about adding the cf-edge-cache: no-cache header for when NGINX hits PHP and skips the cache entirely (and NOT just for logged in users). Does that make sense?

when NGINX hits PHP you should get cf-edge-cache header from the plugin.

1 Like

You’re right. So I should also get it for logged in users since NGINX hits PHP for those too. So no need to add cf-edge-cache: no-cache anywhere then.

Seems like I can now wait for the cache to be primed and see if the unnecessary trips to origin are now gone. CPU usage and bandwidth should also be pretty low once the cache is primed.

1 Like

yes, basically any server’s cache solution needs repopulation after APO is enabled, so it will start serving cf-edge-cache header. That’s why I said you need to clear server cache.

1 Like

Well, it seems like we are on the right track! With APO activated and the proper cf-edge-cache header now being added, we can clearly see in the screenshot below (rightmost part of the graphic) that bandwidth and CPU usage have already gone down to levels similar to what I was getting with a Cache Everything Page Rule.

Imgur

Thanks again for your help @yevgen.

2 Likes

Rather than add header blindly you can conditionally add it only when cache enabler cached file exists. I did something for when I implemented cache enabler advanced nginx caching to support caching search results at GitHub - centminmod/pretty-search-url: This plugin allows Wordpress `?s=wordpress+cache` query strings to be redirected to pretty url format `/search/wordpress+cache/`. That might give you more ideas for conditionally added headers only when cache enabler cached files exist.

I tink that’s the whole point of the try_files directive. If try_files finds the file, it will load the file and add the header with add_header.

If the file is missing, then try_files will redirect the request to PHP and will NOT add the header.

As per the NGINX documentation (Module ngx_http_core_module):

Checks the existence of files in the specified order and uses the first found file for request processing. If none of the files were found, an internal redirect to the uri specified in the last parameter is made.

Since NGINX processes everything in order, once try_files does an internal redirect, it will skip executing whatever is below the try_files directive inside the same location block.