Sitemap Attack Vector

I recently helped a client move a domain from their malware infested site, to a new site. However I received heavy traffic for requests that were clearly no longer on the site.

The malware had created and submitted a new site map to google, which google then proceeded to crawl consistently, so now google is constantly making bad requests for bad files which if you are not careful can trigger certain firewall settings and block the google crawler.

This isn’t so much of a question so much as it is a story as this type of attack if not properly rectified quickly in google webmaster tools can have serious consequences.

Has anybody else experienced this type of scenario and what are the best practices you have employed to fix these situations other than fixing the site map in google webmaster tools, and white listing the google crawler ips.

After fixing the issues and sitemap, also disallow the compromised directories in robots.txt Disallow: /compromised_directory/ and create a Cloudflare Firewall rules e.g. (http.request.uri.path contains "/compromised_directory") blocking everything that tries to the origin.
*I’d also be most curious as to how it happened to begin with and begin taking steps to instruct the client about ways of hardening site security.

1 Like

That’s good advise, forgot to do that.

The client (small family run Thai restaurant that I frequent) had another friend/customer set up their site on wordpress and then left. So it was on hostgator, and was never updated in the last 3 years…the standard recipe for disaster.

@Withheld, the proper way to handle the URLs generated by the malware while the site was infected is to allow Googlebot to crawl them so that it gets a 404, according to these Google instructions.. Googlebot will quickly drop the pages off Google index, if it hasn’t done yet, but unfortunately it may take years for Googlebot to completely forget these URLs. Just ignore its visits to these URLs, it’s the way things are.

You might want to set Firewall Rules to block these URLs while whitelisting Google and perhaps other search engines. But actually, if the site has been thoroughly cleaned, blocking these URLs will act more as psychological comfort for the site owners then as a way to prevent further malware infections. The only possible benefit of blocking these URLs is that whoever tries to access them can be identified via Firewall Events log (but you won’t be able to tell whether is some malicious actors or just some naive visitor who clicked a cached page somewhere)

The above link to Google instructions opens a page on a long list of pages dealing in detail with malware removal. And they are long for a reason, there’s always a lot to be done after a website infection.

1 Like

@cbrandt That’s a good find an interesting.

Yes as I check my logs, even thought google is 404’ing on all of the bad links, its still coming back again and again on hundreds of links over and over…so I am leaning to @Withheld method as I don’t know if the google published response is up-to=date…as I learned when I went to google io…there are still humans behind the scenes.

1 Like

While not mentioned by the OP it greatly depends on the volume of traffic hitting the origin. If the host is on shared hosting package it doesn’t take long before 5xx errors begin and the real penalties begin.

1 Like

Correct! I keep forgetting that blocking on Cloudflare is not the same as blocking at the origin. If the volume is high enough to be of concern, then a Firewall Rule blocking these URLs while whitelisting Googlebot, coupled with a Robots.txt directive to make Googlebot crawl at larger intervals, could be a solution.

This topic was automatically closed after 30 days. New replies are no longer allowed.