From what I’m seeing it does not appear that GTmetrix always has GTmetrix in it’s user-agent.
Dumping the headers shows that the user-agent did not include GTmetrix when I had pointed it at my Worker.
I have tested it against a normal URL on my Server.
Logged in at GTmetrix: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.193 Safari/537.36 GTmetrix
Anonym GTmetrix: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.193 Safari/537.36 GTmetrix
It definitely shows up with the “Gtmetrix” in my logs (serverside). But I actually dont know how to detect it via Workers, or why it does not show up there. Maybe it gets cut off there?
But you’re right. When I check what GTmetrix shows me as “request Header” it is not part of the user-agent:
Maybe the comment is “GTmetrix” and is getting cut of by Cloudflare? I cant tell sorry. Maybe the OP should create a ticket to ask CloudFlares support if there is a way to detect/fetch the raw full user-agent?
But the approach of the OP seems to be right. As this example uses the same method:
const userAgent = request.headers.get("User-Agent") || ""
if (userAgent.includes("bot")) {
return new Response("Block User Agent containing bot", { status: 403 })
}
I had a use case to detect certain bots and serve response from cache instead of making a call to origin.
We wrote the following method to use a regex pattern to determine the agent. Better to make the regext pattern configurable than hardcode it in the code.
export async function checkIfBot(request) {
//TODO: get from BrowsePagesCommonKV
let botPattern = “(googlebot/|Googlebot-Mobile|Googlebot-Image|Google favicon|bingbot|AdsBot-Google-Mobile|APIs-Google|AdsBot-Google|Googlebot-News|Googlebot-Video|Mediapartners-Google|googleweblight)”
const re = new RegExp(botPattern, 'i')
const userAgent = request.headers.get('User-Agent')
let url = new URL(request.url)
console.log(url.href)
let isBotRequest = false
if (re.test(userAgent) || url.href.includes(“cache-test”)) {
console.log(‘the user agent is a crawler!’)
isBotRequest = true
}