Strip UTM query string

Hi,

I was finding that utm tags in the query string that were non unique for incoming requests (from newsletters etc) meant that each user got a cache MISS.

I built a simple worker script that strips off these query parameters before sending the request to the server so that they get a cache HIT instead.

addEventListener('fetch', event => {
  event.passThroughOnException()
  event.respondWith(handleRequest(event.request))
})


async function handleRequest(request) {
  
  let url = new URL(request.url)
  
  //strip out utm tags
  var deleteKeys = []

  for(var key of url.searchParams.keys()) { 
    if(key.toLowerCase().startsWith('utm')){
      deleteKeys.push(key)
    }
  }
  for(var key of deleteKeys){
    url.searchParams.delete(key);
  }
  
  let modifiedRequest = new Request(url, request)

  return fetch(modifiedRequest)
}

I am porting the exclusion list I currently use in varnish. How about something like this ?

/**
 * Define regular expressions at top to have them precompiled.
 */
const urlRegex = new RegExp('(refreshce|gclid|cx|ie|cof|siteurl|zanpid|origin|utm_(source|campaign|medium)|fb(cl)?id|fbclid|mr:[A-z]+ref(id|src))');


addEventListener('fetch', event => {
    event.passThroughOnException()
    event.respondWith(handleRequest(event.request))
})
  
  
async function handleRequest(request) {
    
    let url = new URL(request.url)
    
    url = await normalizeUrl(url)
    
    let modifiedRequest = new Request(url, request)
  
    return fetch(modifiedRequest)
}

async function normalizeUrl(url) {
    for(var key of url.searchParams.keys()) { 
        if(key.match(urlRegex)){
            url.searchParams.delete(key);
        }
    }
    return url
}

I like it a lot. The only thing I noticed that could be a problem was similar to a bug I had: I originally deleted a key from searchParams whilst iterating the collection and it failed to remove some of the keys. Javascript is not my first language though so I may have done something wrong, but I know in other languages you would not be allowed to do that - hence the deleteKeys array in my code.

async function normalizeUrl(url) {
    let deleteKeys = []
    
    for(var key of url.searchParams.keys()) { 
        if(key.match(urlRegex)){
            deleteKeys.push(key)
        }
    }

    deleteKeys.map(k => url.searchParams.delete(k))

    return url
}

That looks great thanks!

I’ll give this a tryout in the next couple of days.

Are you in the process of porting over from Varnish? I have to do that on a couple of sites and wonder what delights I’m going to encounter along the way…

You might want to use this regexp instead of the one above to have only exact matches and to catch all UTM parameters (I needed to let utm_term through):
const urlRegex = new RegExp('^(refreshce|gclid|cx|ie|cof|siteurl|zanpid|origin|utm_[a-z]+|fbid|fbclid|mr:[A-z]+|ref(id|src))$');

Hey @petetak @l.lizzeri, I’ve tried both of your examples with errors. Any idea why? I’ve tried declaring
var url

https://cloudflareworkers.com/#7fec36c4b9a6e64a15d4d3b04c5ffaaf:https://tutorial.cloudflareworkers.com?utm_campaign=123/

Uncaught (in response) ReferenceError: urls is not defined

Here’s a more concise way to strip search params:

    const regex = new RegExp('fb(?:cl)?id|gclid|msclkid|utm_[a-z]+', 'i')

    Array.from(url.searchParams.keys(), (key) => {
      if (key.match(regex)) {
        url.searchParams.delete(key)
      }
    })