Bandwith Overload : Gibberish Hack on Client Website

Greetings,

Client had a really old wordpress website which he wanted to get redesigned. As the migration of the new site began, I realized he had folders which shouldn’t be within his wordpress installation - turns out he had a firmly established gibberish hack which managed to index over 200.000 randomly created pages on Google.

New site is up, hacks have been cleared and removed, but the Google index issue remains - thousands upon thousands crawl requests are constantly being made, no matter what I limit or set for crawl speed, robots.txt, etc.

Since the hacks were removed, all of the randomly created pages redirect to a 404 page, which should hopefully one day make Google clean up its index. However, the bandwith issue this creates on a daily basis is not sustainable.

I’m trying to figure out how to use Cloudflare firewall settings to block URL paths created by this bot. At least part of them, as to make sure that the other part is being hit by 404’s on Google. The URL’s the hack indexed all look somewhat like this :

domain.name / random numbers / random page

The amount of random numbers is either 5 or 6

Is it possible to create a firewall rule which Blocks when the URL path is / random numbers?

Could this be done by, for example, having /9*****/, which would block all random numbers starting with 9?

Thanks so much for all replies and help!

Did you check in Google’s webmaster tools if your robots.txt was crawled? Also, do you have a sitemap.xml that still contains all of those links?

You can do this with one big firewall rule:

Various bots have crawled the robots.txt - Google seems to ignore it as per usual :slight_smile: The crawl rate of GoogleBot has also been reduced entirely on the webmaster tools.

The new, correct, sitemap has been given through the GSC but the bot accepted it with success, however hasn’t read through it properly yet (sits at 0).

Great!

This works so far and is blocking the massive crawling - thanks a lot! I tried various other combinations with * and nothing seem to work properly, but this does it :D! I’ll monitor the situation now for a few days and see if I’ll have any more specific questions.

Thanks a lot again!

1 Like

This topic was automatically closed after 30 days. New replies are no longer allowed.