Best way of blocking crawlers that keep changing IP address

What is the name of the domain?

mydomain.com

What is the error number?

N/A

What is the error message?

N/A

What is the issue you’re encountering

My site is being hit several times a minute by crawlers that don’t respect the robots.txt file and change their IP address each time. The requests typically come as a pair: a HEAD request, immediately followed by a GET request with the same query string.

What steps have you taken to resolve the issue?

The IP address range being used is too wide to block without blocking genuine users. I can’t see any obvious distinguishing features in the User Agent string: “Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3663.5 Safari/537.36”.

What is the current SSL/TLS setting?

Off

Can you share more information here, such as one or some IP address(es), and what exactly that makes you believe you’re going too far with the IP ranges?

Chrome/67” was apparently released in April 2018, so roughly 7 years old.

Challenging or completely blocking older browsers, depending on personal preference, could be a way forward.

1 Like

Thanks, that’s good advice - I hadn’t spotted that.

Instead of putting in place a custom rule, would Bot Fight mode (free plan) catch this sort of activity? And if so, does it allow through genuine bots? I want genuine bots to index my site, but not the subdirectory that I have disallowed in robots.txt. Googlebot etc is obeying robots.txt, but these guys aren’t.

Try setting your security level to High or/and create a rule to block threat score above 0.

The free bot fight mode is limited and maybe it will not catch those. I think that the free bot fight mode only uses threat score and javascript challenges. The pro and above is far more smart and uses different methods to catch bad bots and other bad actors with far more efficacy.

A good method to block bad actors is also to block hosting and cloud ASNs as most of the bad actors use those to launch attacks.

I have a really extensive list of ASNs being blocked globally at account level. It catches most of the bad actors easily.

Unfortunately, I can’t copy the ASNs numbers and I don’t know if is possible to export the list, so I would just place it here for you and others to copy and use it if wanted.

Thanks.

1 Like

You could also block HEAD requests if you don’t need it and you said that the attacker use the same query string, try blocking it. If the specific page path don’t need any query string to function, block every query string to that path. You should also set a global rate limiting rule.

This topic was automatically closed 15 days after the last reply. New replies are no longer allowed.