Help with Bot Blocking using UA strings

#1

Hi there.

My site is getting hit by spam bots.

I have read all the advice and community pages but haven’t managed to find the info I need.

My hosts gave me some bot user agents like these:

SemrushBot/3~bl
YandexBot/3.0
AhrefsBot/6.1
CCBot/2.0

As I understand it I can use firewall rules to block these user agent strings.

But I did some research and some of the people who run these bots advise just blocking the UA name.

For example instead of blocking “AhrefsBot/6.1” I would block “AhrefsBot”.

But I am not sure if this would work with the Cloud Flare setup?

I would like to be able to do it this way if possible as it would block all bots from a site rather than just the current version to be hitting my site.

Is this possible?

I hope that makes sense. This is all very complicated for me!

Thanks

Tim

#2

A firewall rule like so would work (action block)

(http.user_agent contains "AhrefsBot" and not cf.client.bot) or (http.user_agent contains "SemrushBot" and not cf.client.bot) or (http.user_agent contains "YandexBot" and not cf.client.bot) or (http.user_agent contains "CCBot" and not cf.client.bot)

Here we use and not cf.client.bot in order to make sure these legitimate bots - that index the web - don’t get blocked. Anyone faking the bot user agent would then be blocked.

If you don’t care about being indexed by legitimate bots, you don’t need the “known bots” part of these rules.

3 Likes
#3

Don’t know about the other two, but Yandex is a legitimate search engine and AhrefsBot is used by (some) legitimate services such as schools. Both of those crawlers respect robots.txt and that is the preferred method.
The problem with using a firewall to control bots, they may continue until they hit robots.txt and unnecessarily flood logs to the point where people may ignore their firewall logs.

2 Likes
#4

Slightly shorter

(http.user_agent contains "AhrefsBot" or http.user_agent contains "SemrushBot" or http.user_agent contains "YandexBot" or http.user_agent contains "CCBot") and not cf.client.bot

This will definitely block Semrush and CC however as these two are not covered by the bot flag.

2 Likes
#5

Thanks so much for the replies, Im still a bit confused but will have a tinker and see what I can figure out

closed #6

This topic was automatically closed after 30 days. New replies are no longer allowed.