Client had a really old wordpress website which he wanted to get redesigned. As the migration of the new site began, I realized he had folders which shouldn’t be within his wordpress installation - turns out he had a firmly established gibberish hack which managed to index over 200.000 randomly created pages on Google.
New site is up, hacks have been cleared and removed, but the Google index issue remains - thousands upon thousands crawl requests are constantly being made, no matter what I limit or set for crawl speed, robots.txt, etc.
Since the hacks were removed, all of the randomly created pages redirect to a 404 page, which should hopefully one day make Google clean up its index. However, the bandwith issue this creates on a daily basis is not sustainable.
I’m trying to figure out how to use Cloudflare firewall settings to block URL paths created by this bot. At least part of them, as to make sure that the other part is being hit by 404’s on Google. The URL’s the hack indexed all look somewhat like this :
domain.name / random numbers / random page
The amount of random numbers is either 5 or 6
Is it possible to create a firewall rule which Blocks when the URL path is / random numbers?
Could this be done by, for example, having /9*****/, which would block all random numbers starting with 9?
Thanks so much for all replies and help!