Rate limiting or block SemrushBot

What is the name of the domain?

findme.directory

What is the issue you’re encountering

SemrushBot is too much…16.12 GB in 3 days

What steps have you taken to resolve the issue?

I don’t even know where to start as Cloudflare dashboard and features keep changing so often, the knowledge base on what to do left me very confused!

From an SEO perspective, I don’t want to block either SemrushBot or AhrefsBot but would prefer to limit both of them. I would like to know how to do this.

Was the site working with SSL prior to adding it to Cloudflare?

Yes

What is the current SSL/TLS setting?

Full (strict)

Semrush is operating multiple bots.

Do you have the exact “User-Agent” for the one(s) that are annoying you?

Hmm…

  1. What Cloudflare plan are you on?

Do you have some detailed information about the traffic patterns, such as e.g. how many requests, and how often the problematic requests come in?

E.g. something along the lines of:

  1. Are the requests coming in steadily all around the hours of the day?
    … Or are there any specific time intervals of the day, where the requests come in?

  2. How many request(s) per 1, 3, 5, 10, 15, 30 second(s)?

  3. How many request(s) per 1, 3, 5, 10 minute(s)?

Thank you for helping me. I am very appreciative and grateful for the help.

I’m on the pro plan.

I’m unsure how to get the actual user agent as I’m simply getting this information from the server using the Awstats application.

Is there a way you can point me in the direction to how I can know which user agent it is in Cloudflare?

Thanks again
Matt

That would seem to be the regular Semrush bot.

You can try to add the following, to your “robots.txt” file:

User-agent: SemrushBot
Crawl-Delay: 10
Allow: /

User-agent: AhrefsBot
Crawl-Delay: 10
Allow: /

It should signal to SemrushBot and AhrefsBot that they are allowed to crawl your site, but that they must wait 10 seconds between each request.

Some bots may honour the Crawl-Delay option, which you may be able to use, in order to slow down these bots.

SemrushBot and AhrefsBot claims that they honour them, but at the same time, neither of them are specifying any maximum value for it.

So you could try adding “Crawl-Delay”, and then be monitoring the situation, to see if that changes anything, after waiting at least a couple of days to a week (to let them be able to be aware of the new “Crawl-Delay”).

There are other bots across the Internet, that are claiming that they are only honouring “Crawl-Delay” requests in specific intervals, such as e.g. from 1 - 30 seconds (BingBot).

Using Gooogle, it seems like the advertised maximum, that some bots mention that they will honour, are generally in the intervals of 30 - 60 second.

If you’re increasing your “Crawl-Delay”, and that alone doesn’t seem to change anything, after waiting for several days to a week to let the bot(s) notice the change, -

Then I would move to the final / last resort, by blocking the individual bot(s).

Thanks again. I am grateful for your help. I’ll do the delay and will let you know if that worked.

Best,
Matt