Is it okay to block Googlebot?

I saw lots - thousands - of very suspicious queries to my website’s /search endpoint, and was surprised that Cloudflare appeared to be letting a fake Googlebot access my site. But then I was even more surprised to see that it really was Google! Here’s an example (some are much worse: online pharmacy, etc):

Query string
?q=v++%EC%97%91%EC%8A%A4%ED%84%B0%EC%8B%9C%ED%8C%9D%EB%8B%88%EB%8B%A4%E3%80%8A%ED%85%94%EB%A0%88%EA%B7%B8%EB%9E%A8saggasi%E3%80%8B%EC%97%AC%EC%84%B1%ED%9D%A5%EB%B6%84%EC%A0%9C%E3%88%A7%EB%8B%A4%ED%81%AC%EC%9B%B9%EC%9E%91%EB%8C%80%EA%B8%B0%ED%8C%90%EB%A7%A4%EA%96%B8%EC%BC%80%ED%83%80%EB%AF%BC%EB%93%9C%EB%9E%8D%EA%95%A8%EC%97%91%EC%8A%A4%ED%84%B0%EC%8B%9C%ED%8C%90%EB%A7%A4
User agent
Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.90 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
IP address
66.249.66.12
ASN
AS15169 GOOGLE
Country
United States

Here’s another one using Path, also Googlebot:

Path

/search/%EC%98%A4%ED%94%BC%EB%9E%9C%EB%93%9C%E3%80%90oplove2.com%E3%80%91OP%EC%82%AC%EC%9D%B4%ED%8A%B8%E3%8B%88%EB%B0%A4%EC%9D%98%EC%A0%84%EC%9F%81%C2%B7%EB%B0%A4%EC%A0%84%E2%A7%84%EC%97%AC%ED%83%91%E1%8D%98%EB%A7%88%EC%82%AC%EB%8D%B0%EC%9D%B4

Query string

Empty query string

I should add that query strings can not naturally arise though use of my site. Queries are POSTed to my server, but “q” is a variable that can be made to produce results. If you’re a hacker…

It hadn’t ever occurred to me to disallow /search in robots.txt and now I have done that. Also, I respond with a soft 406 error - “Not Acceptable” - if you attempt to add your own “q” .

Meanwhile, I continue to use a Firewall rule to issue a JS Challenge against this type of search, and will continue to catch Googlebot until it rereads my robots.txt.

Is this the correct way to handle this case?

1 Like

Googlebot might find those URLs if someone posted a link to those somewhere else - on your site or discussion forums, etc.

Adding those paths to robots and blocking the script from Googlebot is ok in this case.

2 Likes

That’s what I guessed about the origin of the URLs - I’ve caught over 6,000 so far though, that’s a crazy list of links for Google to follow. Our site is legit, there’d never be links like that, and no results for those queries either!

It feels odd to be blocking Google… thanks for the confirmation.

1 Like

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.