Rule for 404

What is the name of the domain?

drmegapanos.gr

What is the issue you’re encountering

We want to use Cloudflare to show 404 to googlebot, when the URL contains the pattern: “_bd_prev_page=” and user agent contains “google” → It should return a 404 status.

What comes to my mind is:

  1. Use a Redirect Rule where User-agent contains or wildcard google to redirect (301) to some unexisting URI path. Despite Google would report 301 then :thinking:
  2. Use Cloudflare Worker to fetch the user-agent and respond with HTTP 404 (hopefully you don’t have a lot of crawled and indexed URLs, otherwise you can expect daily few thousands of requests with that particular URI Query string)
  3. Use Snippets (Pro plan required)
  4. Use combination, Pages to deploy custom 404 page then redirect requests based on user-agent contains google to that 404 (sub)domain
  5. Use WAF rules and block googlebot from accessing URLs with particular query string

Example to fetch user-agent:

Example to return 404:

Worker code:

addEventListener('fetch', event => {
  event.respondWith(handleRequest(event.request))
})

async function handleRequest(request) {
  // Get the User-Agent from the request headers
  const userAgent = request.headers.get('User-Agent') || '';

  // Get the URL and check for the query parameter '_bd_prev_page='
  const url = new URL(request.url);
  const hasBdPrevPage = url.searchParams.has('_bd_prev_page');

  // Check if both conditions are met: User-Agent contains 'Googlebot' and the query parameter is present
  if (userAgent.includes('Googlebot') && hasBdPrevPage) {
    return new Response('Not found', {
      status: 404,
      statusText: 'Not Found',
    });
  }

  // For all other requests, continue with the original request
  return fetch(request);
}

Otherwise, if it is possible, bound the worker to example.com/?_bd_prev_page=* and use below code without query check with Worker:

addEventListener('fetch', event => {
  event.respondWith(handleRequest(event.request))
})

async function handleRequest(request) {
  // Get the User-Agent from the request headers
  const userAgent = request.headers.get('User-Agent') || '';

  // Check if the User-Agent contains 'Googlebot'
  if (userAgent.includes('Googlebot')) {
    return new Response('Not found', {
      status: 404,
      statusText: 'Not Found',
    })
  }

  // For all other requests, return the original request
  return fetch(request)
}

You can create a new Pages project and upload/deploy some custom 404 page for it, then use Redirect rule if user-agent contains (or wildcard nowadays, since it supports lowercase) google to it.

Otherwise, a better way would be to block googlebot and/or anyone else from accessing these URLs which contain such query string _bd_prev_page.

Furthermore, since Workers might be costly for this maneuver as Googlebot crawling & indexing frequency might be too much, using Snippets it might also be a case to achieve, but you’d have to use Pro plan for such case :thinking:

please share the screen shot

May I ask of what exactly? :thinking:
Issue or result or code?

Tried the first code for workers but does not seem to work. It returns Status code 200 and redirects to Homepage.

Chances are your worker doesn’t match with the request. Use Trace to confirm whether you’ve configured the route for your worker correctly.

Alternatively, Snippets can help achieve this in a more straightforward way:

  1. Snippets Code:
 export default {
  async fetch(request) {
    return new Response('Not found', {
      status: 404,
      statusText: 'Not Found',
    });
  },
};
  1. Snippets rule:
(http.request.full_uri wildcard "*_bd_prev_page=*" and http.user_agent wildcard "*Google*")

Keep in mind that _bd_prev_page= might trigger WAF signature for SQL injections, so if you see 403s with a message similar to This website is using a security service to protect itself from online attacks. The action you just performed triggered the security solution. There are several actions that could trigger this block including submitting a certain word or phrase, a SQL command or malformed data. when performing tests like curl -sv "https://{your_domain}/test?test&_bd_prev_page=1" -H 'User-Agent: Googlebot/2.1 (+http://www.google.com/bot.html)', you may need to skip WAF for these requests in order for your snippet to work.

1 Like