Facebook now adds fbclid query string to URLs, busting CloudFlare's cache

caching

#1

Hi,

So since a few days, Facebook started adding an fbclid query string to all URLs posted on its platform. The URLs look like this: https://www.website.com/perma-link/?fbclid=IwAR2FEKP2N1EZQ0QU7ioC1MHvrqnrjtETeDNCpG9dkd3cLZIu_OF-IjTD2-c

This query string is unique for each user, so it means every user coming from Facebook will NOT hit CloudFlare’s cache or your server’s cache, unless you configured your server to ignore this query string. This also means people who get a lot of trafic from Facebook (like me) will see big performance hit when it comes down to TTFB and page load speed. I have seen a TTFB increase of roughly 50% after Facebook started using this query parameter.

I wanted to start this thread so people can share ideas on what could be done to mitigate the situation and hopefully be able to serve cached pages to users coming from Facebook.

If you want a simple and free fix, you should consider using @boynet2 's redirect method using a CloudFlare Page Rule so that requests made to your site are redirected to the proper URL without fbclid paramters before reaching your server.

To remove / strip the fbclid query string paramter, create a Page Rule in your CloudFlare dashboard with the following code:

1. Under If the URL matches, enter http://www.example.com/*?fbclid=*

2. Click Add a Setting and choose Forwarding URL and 302 - Temporary Redirect

3. Under Destination URL, enter http://www.example.com/$1

Another easy fix that can be applied on CloudFlare is to use this Worker script along with the proper routes. Please note that this fix will add extra fees to your current CloudFlare plan ($5 per month for 10 million worker requests):

Worker script:

addEventListener('fetch', event => {
  let url = new URL(event.request.url)

  if (url.searchParams.has('fbclid'))
   url.searchParams.delete('fbclid')

  event.respondWith(
    fetch(url, event.request)
  )
})

Worker routes:

On: https://www.website.com/*
Off: https://www.website.com/wp-content/*

Another fix that can be applied at the web server level (NGINX) is to add this map directive to the http block of your config:

map $request_uri $request_uri_path {
    "~^(?P<path>[^?]*)(\?.*)?$"  $path;
}

You’ll also need to modify the fastcgi_cache_key with the following line of code:

fastcgi_cache_key "$scheme$request_method$host$request_uri_path";

What this does is ignore all query strings passed to the server and simply return the cache entry without the query string. This enables the server to serve cached pages even if the fbclid paramter is present. But it also prevents all query strings from working. Please note that using this code will break anything on your site that uses query strings, so be careful if you use it.

So I’d really like to hear your take on this and also hear CloudFlare’s opinion on the matter.

Don’t hesitate to ask questions if you have any.

Thank you.


#2

I also noticed it, solved it by adding this page rule:

http://www.example.com/*?fbclid=* forwarding 302 to http://www.example.com/$1


#3

That looks really simple and effective. I’ll certainly try it.

Though, do you think Facebook could find it problematic that every URL is being redirected with a 302 ? Maybe it won’t have any negative impact at all. I’m just wondering.


#4

I don’t think so, because when facebook bot crawl your site it doesn’t add this parameter… and they dont have problem with all the url shortening services like bit.ly.

its really nasty thing what they did…


#5

I just implemented what you provided and it works great as expected, though it does add around 90 ms to the page load from the 302.

And I totally agree that once again, Facebook has implemented a feature to their site without thinking about the impact to all the websites that share links on their platform. But that’s how Facebook always acts, without caring about the consequences for the publishers. It’s not new. It’s just getting worst every day.


#6

luckily for me its only about 5% of traffic so I can take that(not all facebook traffic come with this parameter maybe they are still testing it)
I hope cloudflare will implement some global rule to remove this parameter globally that will be faster or something


#7

I didn’t check how much traffic goes to my site with the query string, but after looking at the NGINX logs, I saw that most requests seemed to come with the fbclid paramter.

I think the only way to do it with CloudFlare right now is with a Worker, but like I mentioned, the cost is very high for a small nuisance that unfortunately can have a big impact on performance.

I hope CloudFlare reads this post to they can see the issue and maybe add a Page Rule to ignore the fbclid query parameter. And why not add a Page Rule so you can specify your own custom query paramters to ignore? That could help people that want to ignore utm* parameters and others as well.


#8

By the way, to anyone that reads this post, I highly suggest using @boynet2 's method as you won’t have to configure anything on your server and all traffic will be redirected directly from CloudFlare.


#9

you could match against argument ?fbclid only to determine if you want a request_uri to return a path without the query string so not to mess with other query strings ?

if ($args ~ "fbclid") {
...
}

or just an nginx map to determine the $args to match against


#10

Good idea @eva2000 , but I think map would be better since “if” is deemed “evil” by NGINX. But I’m not a map expert, so I wouldn’t know how to write such a directive. For now, I think @boynet2 's suggestion is still the best one so far.


#11

maybe draw inspiration from this write up similar just for proxy_cache instead of fastcgi_cache http://redsunsoft.com/2015/05/caching-proxy-requests-and-stripping-params-with-nginx/

not all nginx ifs are evil read cases where if is ok https://www.nginx.com/resources/wiki/start/topics/depth/ifisevil/ - including for argument matches


#12

Thanks @eva2000, I will read up on those links, I really appreciate it.

Though, this still doesn’t address the problem with CloudFlare caching. Facebook’s new fbclid query string parameter render CloudFlare’s “Cache Everything” Page Rule quite useless unfortunately.


#13

If you’re able to ignore the query string on your site, then you can also set the caching level at Cloudflare to ‘Ignore Query String’ too. You should now see negligible impact.

You could just pattern match URLs containing fbclid and set the cache level in a Page Rule if you didn’t want to ignore query strings globally.


#14

Hey @saul , thanks for the tip. Unfortunately, “Ignore Query String” only works for “static content” which doesn’t seem to include HTML files as per CloudFlare’s documentation: https://support.cloudflare.com/hc/en-us/articles/200168256-What-are-Cloudflare-s-caching-levels- and https://support.cloudflare.com/hc/en-us/articles/200172516-Which-file-extensions-does-CloudFlare-cache-for-static-content-

I tested this setting and I could still see “MISS” for the “cf-cache-status” header when testing URLs with the fbclid query string parameter. This would indeed have been the best option, specially when combined with a Page Rule to match the proper fbclid pattern, but unfortunately, it doesn’t work.


#15

Well, TIL. I have a page rule with cache everything for certain paths and I was sure when I tested it it cached static HTML pages along with everything else that matched the pattern.

EDIT: My memory was right but the HTML was only cached because it was via a Page Rule and you’re right that the Query Cache settings wouldn’t be applied to them…Good to know. Looks like this would need to be a Workers solution, as you say.


#16

@saul, the HTML is indeed cached, that’s not the issue. It seems the “Ignore Query String” setting is applied to a bunch of static files, but not HTML. I can confirm this after doing some tests and looking at the “cf-cache-status” header that always returns “MISS” when appending a different fbclid parameter to the same URL.


#17

well I use fastcgi_cache as well and after some research see I need to strip google adwords as well as facebook appended ?gclid and ?fbclid query strings so this seems to work

map $args $ignorearg {
  default             0;
  ~*fbclid            1;
  ~*gclid             1;
}
if ($ignorearg) {
  # strip gclid/fbclid query strings from urls
  rewrite ^ $uri? permanent;
}
curl -I https://domain.com/php/?gclid
HTTP/1.1 301 Moved Permanently
Date: Sun, 21 Oct 2018 23:32:25 GMT
Content-Type: text/html
Content-Length: 162
Location: https://domain.com/php/
Connection: keep-alive
Server: nginx centminmod
X-Powered-By: centminmod
X-Xss-Protection: 1; mode=block
X-Content-Type-Options: nosniff
curl -I https://domain.com/php/?fbclid
HTTP/1.1 301 Moved Permanently
Date: Sun, 21 Oct 2018 23:32:31 GMT
Content-Type: text/html
Content-Length: 162
Location: https://domain.com/php/
Connection: keep-alive
Server: nginx centminmod
X-Powered-By: centminmod
X-Xss-Protection: 1; mode=block
X-Content-Type-Options: nosniff
cat /tmp/fastcgicache/7/fd/ef1a716e93dde9414f8caeba08e1efd7 | head -n2
y
KEY: httpsGETdomain.com/php/

#18

Great find @eva2000 ! And it looks like it works well. But I’d rather do it without a redirect since that adds (for my server anyways) around 80 ms to first view. I’m currently trying to figure out a way to do it with a MAP directive and without an IF. I’ll certainly post my findings here if I do figure it out.


#19

Here’s the MAP directive i’ve come up with so far:

map $request_uri $request_uri_path {
    "~^(?P<path>[^?]*)(\?fbclid.*)?$"  $path;
}

This maps $request_uri to $request_uri_path only when there’s an fbclid parameter, and strips the fbclid parameter, but I didn’t figure out yet how to use that with setting fastcgi_cache_key. You can’t define fastcgi_cache_key inside an IF. So I’m stuck there.

P.S. Can someone please tell me how to do a proper code block here? Seems like the “Preformatted text” button is being a jerk with me.


#20

Bump for others to see.