Strip UTM query string

recipe-exchange

#1

Hi,

I was finding that utm tags in the query string that were non unique for incoming requests (from newsletters etc) meant that each user got a cache MISS.

I built a simple worker script that strips off these query parameters before sending the request to the server so that they get a cache HIT instead.

addEventListener('fetch', event => {
  event.passThroughOnException()
  event.respondWith(handleRequest(event.request))
})


async function handleRequest(request) {
  
  let url = new URL(request.url)
  
  //strip out utm tags
  var deleteKeys = []

  for(var key of url.searchParams.keys()) { 
    if(key.toLowerCase().startsWith('utm')){
      deleteKeys.push(key)
    }
  }
  for(var key of deleteKeys){
    url.searchParams.delete(key);
  }
  
  let modifiedRequest = new Request(url, request)

  return fetch(modifiedRequest)
}


#3

I am porting the exclusion list I currently use in varnish. How about something like this ?

/**
 * Define regular expressions at top to have them precompiled.
 */
const urlRegex = new RegExp('(refreshce|gclid|cx|ie|cof|siteurl|zanpid|origin|utm_(source|campaign|medium)|fb(cl)?id|fbclid|mr:[A-z]+ref(id|src))');


addEventListener('fetch', event => {
    event.passThroughOnException()
    event.respondWith(handleRequest(event.request))
})
  
  
async function handleRequest(request) {
    
    let url = new URL(request.url)
    
    url = await normalizeUrl(url)
    
    let modifiedRequest = new Request(url, request)
  
    return fetch(modifiedRequest)
}

async function normalizeUrl(url) {
    for(var key of url.searchParams.keys()) { 
        if(key.match(urlRegex)){
            url.searchParams.delete(key);
        }
    }
    return url
}

#4

I like it a lot. The only thing I noticed that could be a problem was similar to a bug I had: I originally deleted a key from searchParams whilst iterating the collection and it failed to remove some of the keys. Javascript is not my first language though so I may have done something wrong, but I know in other languages you would not be allowed to do that - hence the deleteKeys array in my code.


#5
async function normalizeUrl(url) {
    let deleteKeys = []
    
    for(var key of url.searchParams.keys()) { 
        if(key.match(urlRegex)){
            deleteKeys.push(key)
        }
    }

    deleteKeys.map(k => url.searchParams.delete(k))

    return url
}

#6

That looks great thanks!

I’ll give this a tryout in the next couple of days.

Are you in the process of porting over from Varnish? I have to do that on a couple of sites and wonder what delights I’m going to encounter along the way…


#7

You might want to use this regexp instead of the one above to have only exact matches and to catch all UTM parameters (I needed to let utm_term through):
const urlRegex = new RegExp('^(refreshce|gclid|cx|ie|cof|siteurl|zanpid|origin|utm_[a-z]+|fbid|fbclid|mr:[A-z]+|ref(id|src))$');