How to challenge everyone except the following

Hi

I need to stop content scraping bots from cloning my site
so, I want to challenge everyone except the following rules

1- Allow all good know bots (Google and Bing, etc)
2- Allow all traffic comes from famous search engines pages like Google, Bing, and Yahoo

How to achieve these rules in the firewall setting?

Regards

  1. Is easily possible via cf.client.bot
  2. Is tricky to impossible. You’d have to work with the referrer, which can be faked and be missing. In either case you’d either still have crawlers to challenge legitimate visitors.

1- Allow all good know bots (Google and Bing, etc)
is this correct?

====================

2- Allow all traffic comes from famous search engines pages like Google, Bing, and Yahoo

You said its almost impossible
is there any guide that i can follow to achieve this one?

That would be correct.

You need to extend your existing rule to check for the referrer and whether it contains any of the allowlisted domain names. But again, that can easily be faked, respectively legitimate requests can have it missing and you will challenge them too.

That’s really tricky for less known users like me

I see that cloudflare challenge is about 5 seconds, can we reduce it to 2 or 3 seconds?

You cant.

Thanks for fast response " sandro"
but why everytime after I apply the rule of (not cf.client.bot)

all traffic comes from search engine is gone?

in the following image as you can see
its only one user from google
once i remove the rule that i set in firewall setting (not cf.client.bot)
it comes back to normal which is 40-50 user

Traffic is not gone, but requests which are not from well-known crawlers will be challenged. That essentially means all your regular visitors get the challenge.

It probably is better to find a pattern in the requests you want to block and block them explicitly.

unfortunately, I don’t have the required knowledge to do this

I’d start by analysing the requests you’d like to block and try to find a pattern. That could be the country, the user agent, the IP address block, etc. Once you managed to find a pattern you can try to implement a block either on your end or on Cloudflare

The pattern is really hard to track because they are using a fake user agent to simulate the normal user browser to clone and mirror my site

then publish the spam domain to google and after 2 or 3 weeks my site get rank hit because of duplicate content

As for your original question, the following might do the trick

(not cf.client.bot and not http.referer contains "google.com" and not http.referer contains "bing.com")

But again, this can be easily bypassed and can apply to legitimate visitors as well.

This rule will show the challenge JS to everyone except:-

1- Well known bots (Google and Bing, etc)
2- all traffic comes from google.com and bing.com domains

Correct?

but you said " it can be easily bypassed"
so, the attacker can simulate the google.com domain traffic too?

I have said that all along :wink:

Ouch :smiley:

Will Google not block him for doing such simulate process?

Google is not involved at all here. The entire thing relies on the referrer, which - as I said several times :wink: - can be easily faked.

Thanks for keep supporting and replying in my topic “sandro”

and if you know any other guide articles or steps to help me in this issue
it will be highly appreciated

Regards

That topic is very broad and blocking such requests is not completely impossible but often very tricky. I would suggest you start with basics like what HTTP is, what data you can get from an HTTP request, etc. It is not something you can cover in five minutes however.

Thanks once again