Aggressive Facebook Bot

I have been getting sporadic but aggressive hits from Facebook’s bot (facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)). The hits come in rapidfire succession from multiple IP addesses from Irland, the UK, and Brazil.

Some include:

• 2a03:2880:20ff:f::face:b00c (Ireland)
• 2a03:2880:20ff:13::face:b00c (Ireland)
• 2a03:2880:20ff:25::face:b00c (Ireland)
• 2a03:2880:11ff:18::face:b00c (UK)
• 2a03:2880:11ff:6::face:b00c (UK)
• 2a03:2880:21ff:b::face:b00c (Ireland)
• 2a03:2880:21ff:17::face:b00c (Ireland)
• 2a03:2880:ff:10::face:b00c (Brazil)
• 2a03:2880:32ff:d::face:b00c (Ireland)

This is a very short list from today, and many are hitting very old pages that I cannot imagine are Facebook user previews from entering large numbers of URLs. Moreover, they are all from outside the United States.

I have tried to block this user agent when it comes from outside the US, but it breaks the preview feature on Facebook. I also tried challenge, and that breaks the feature as well.

The behavior looks more like a crawler than users sharing urls and pulling previews.

The bot can become so rapid fire that it will cause the site to start throwing up errors.

Is there a solution to this problem? Can I throttle a user agent independent of the IP address? I have emailed Facebook (as suggested at http://www.facebook.com/externalhit_uatext.php) but have not recieved a reply.

If you throttle you will still have preview issues.
It comes down to you either accept these requests or possible issues with the preview on Facebook.

Furthermore Facebook does not seem to honour robots.txt, so that might not be an option either. However https://developers.facebook.com/docs/sharing/webmasters/crawler/#crawler-rate-limits seems to address your issue.

Thanks.

Facebook’s advice for rate limiting is using og:ttl which only works on a single page, telling the crawler when it is of to make a fresh scrape. As I understand it, this defaults at 30 days.

I will try it but the problem is that Facebook is scrapping many pages, from multiple IPs, all in quick succession… as if they decided do the entire site all in a short few minutes rather that spread out over time.

Thats probably best a question for Facebook. On Cloudflare’s side you cant do much more than block them. Even if you rate limited them (and that would be IP address specific) you’d still have preview issues.

As I said earlier, it comes down to accepting the requests or issues with the previews.

This topic was automatically closed after 30 days. New replies are no longer allowed.