Distributed hostile web crawler from China throttling server

My server has been crashed several times the last week by a extremely aggressive hostile distributed web crawler.

It is hitting over a million pages per day and seems to be crawling all content on our website repeatedly. It ignores robots.txt and is disguising itself with random IP addresses and agent strings.

I verified it is a crawler, not real traffic as the hits do not show up in Google Analytics, and are following a web crawler pattern of hitting every page (our site has a lot of content).

It does not seem to be content in crawling the site once, but is hitting the same pages repeatedly. Not sure of its purpose, but it is very hostile and aggressive, and because the IP addresses and agent string seems to always be different, I don’t see any way to block it.

I tried to block all China in the firewall, but it just started using IP addresses from every other country in the world. So now Cloudflare is blocking 1 million calls from China, but I am still getting 1 million calls from USA, Brasil, India, etc.

I noticed that because is uses random agent strings, a lot of them are MSIE agent strings. Since no one uses Internet Explorer anymore I tried block any agent string with MSIE. But now I get 1 million blocked calls from China, 1 million blocked call for MSIE agent, but still get 1 million calls with every other agent string (Chrome, Firefox, Opera, etc.).
Basically there does not seem to be any way to block it, anyone have any ideas?

Some of the original IPs and agent strings are, - Zhejiang - Shanxi - Guangdong
Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; The World)
Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0
Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0

Has anyone else seen this new crawler on their websites?

Any idea of how to stop it?

Anyone know what it is, or what its purpose is, where it originates from?

In Firewall Tools, have you tried enabling Bot Fight Mode?

How about increasing Security Level to High?

I enabled both of these and it seems to have stopped the attack.
I will monitor more and remove each option one by one to see which options are required to stop the attack.
Thank you.

1 Like

Sorry, spoke too soon. That stopped it for about 1 hours, but now it is back, same through put.
Not sure how it is able to adapt to anything I try.

Open a Support Ticket. Maybe they can see what’s happening with that traffic.

Login to Cloudflare and then contact Cloudflare Support by clicking on the Get More Help button.

This topic was automatically closed after 14 days. New replies are no longer allowed.