How Bypass Redirect Rule for Known Bots

What is the name of the domain?

furkank.dev

What is the issue you’re encountering

I am using Redirect Rules to forward traffic coming from outside Turkey and Cyprus (except specific IPs) to another site. The rule works, but it also blocks verified bots (like search engine crawlers), which negatively impacts my SEO. I attempted to resolve this by creating a WAF rule that allows known bots (cf.client.bot or cf.verified_bot_category), but it only applies to traffic after the Redirect Rule, which means bots are still redirected and unable to access my site.Just work for bots coming from Turkey

Concerned about only the real Googlebot, or some others as well? :thinking:

If you’d switch to use Workers for a redirect, then the Custom WAF Rule would apply for them as well.

Example:

Reminds me a bit onto the topic here:

Maybe below example could help if I understood your case correctly, can adjust always:

addEventListener('fetch', event => {
  event.respondWith(handleRequest(event.request))
})

async function handleRequest(request) {
  const country = request.headers.get('CF-IPCountry') // Get the country from Cloudflare's headers
  const userAgent = request.headers.get('User-Agent') // Get the User-Agent
  const asn = request.headers.get('CF-ASN') // Get the ASN number
  const isGooglebot = userAgent && userAgent.includes("Googlebot")
  
  // Check if the request is from Turkey or Cyprus
  const isFromTurkeyOrCyprus = (country === "TR" || country === "CY")
  
  // Check if the request is from Googlebot and Google ASN
  const isFromGoogle = isGooglebot && asn === "15169" // Google ASN
  
  // If the request is from Turkey or Cyprus, and not real Googlebot, redirect
  if (isFromTurkeyOrCyprus && !isFromGoogle) {
    return Response.redirect('https://www.google.com/', 301)
  }
  
  // Return the original request if no conditions matched
  return fetch(request)
}

Might be costly if used on whole domain, if so.

Otherwise, are you using a free or paid plan type? :thinking:

Thank you so much for your effort and kindness! I was able to solve the issue by adding the User-Agent strings of the top 10 well-known bots. Now, the bots can successfully crawl my website. I don’t have much experience with workers, and this approach seemed simpler to me. Thanks again for your support—I truly appreciate it!

1 Like

This topic was automatically closed 2 days after the last reply. New replies are no longer allowed.