How to Rate Limit Twitterbot

I recently noticed a spike in the CPU. After reviewing the logs, I found out that the requests coming from User-Agent Twitterbot/1.0 were the reason for this spike. I blocked the requests using a User-Agent Blocking Rule, and the CPU usage dropped down on the spot. However, and since Twitter Cards use Twitterbot User-Agent for URL crawling & caching, the Twitter Card validator does not work anymore after activating the User-Agent Blocking Rule.

Is there any way I can rate limit those requests coming from Twitterbot/1.0 User-Agent?

Below is a related question I posted a few months ago:

I am also pasting sample requests generated by Twitterbot in a fraction of a second. As you can see, it targets multiple URLs at the same time, causing this spike.

Any thoughts?

This is a good indicator that your site is being reached by more people, I don’t think you can rate limit it properly. Consider getting a better server perhaps? It’s a bit awkward that requests from twitterbot are affecting your CPU performance.

@jnperamo The requests are most likely non-human since all the requests are generated at the same second (most likely robotic). For example, URL1, URL2, …, URL1000 are all getting requested at the same time.

That’s true, server upgrades and load balancers can fix this, but there must be a shorter way for doing that on the Edge. I am thinking of CF Workers, but I am unsure if I can create a rule to drop requests exceeding a certain threshold.

Right now, I am trying to control the crawl-rate by adding the below to robots.txt hoping that it works.

User-agent: Twitterbot
Allow: /
Crawl-delay: 5

I think that you could lighten the load a bit with workers but uuh, as far as I know, the cards are injected within your page, you would have to somehow detect the twitter bot, build an empty page with just the twitter card and return that to the twitter bot. I’m not a frontend dev so I’m unsure of whether that’s an optimal approach or even possible.

1 Like

This topic was automatically closed 15 days after the last reply. New replies are no longer allowed.