404 attack made by Facebook

thanks for your reply.

About this:
“For now, I’d recommend using a Firewall Rule to block or challenge by User Agent String CONTAINS facebookexternalhit”

this will block legitimate crawls / scraps by Facebook, so I’m not sure about it. The best would be to rate limit, but Known Bots are set to skip Rate Limit. So is there another way?

Also, about redirection of 404 to one single page - I would love to hear more opinions on this from webmasters. Thanks

1 Like

Cloudflare is trying to play nice with Facebook, but Facebook is unhinged, so they’re abusing their crawl privileges. If you rate limit Facebook, you’re essentially blocking their crawl of legitimate pages anyway. Without Facebook’s cooperation, there isn’t any good solution here. You’ll have to take this up with Facebook.

1 Like

ok thank you, so I guess this is what is left to do:
“Also, about redirection of 404 to one single page - I would love to hear more opinions on this from webmasters. Thanks”

Cloudflare has a feature to cache based on status code - but it is an Enterprise feature.

Does not make sense to me - in order to do the redirection, the traffic still going to hit your server anyway, means the attack still can reach your server, but they might see the 404 redirection after you configure it.

My theory is that the server will immediately redirect to a 404 page, so the only page that loads will be cached instead of an uncached page with that unique URL.

Hmm, it will reduce the transfer size in theory, but yeah, the server still needs to respond with 302 redirect thus it does not totally block the attack.

At this point, that seems to be the best option, as OP doesn’t want to block the Facebook crawler.

1 Like

@yuvalgov I experienced something similar a few months ago and I started with creating a Firewall Rule to block non-legitimate traffic as @sdayman suggested. That worked, but at the end of the day, it is more like a cat/mouse scenario. The Firewall blocks the bot, the bot changes the scraping techniques. Consequently, you will have to create another rule and so on.

I was monitoring the Cloudflare analytics for a while and that’s where I found the high hits on the 404 page while the scraper was trying different URL combinations. However, I couldn’t find useful insights to create a Firewall Rule. What I did is create a Rate-Limiting rule on the 404 page and log the blocked IPs. Then, I add those IPs to a JS Challenge rule I created along with some other combinations such as “user-agent”.

This technique worked but still requires monitoring. See attached low CSR (Challenge Solved Rate).

Here’s also another discussion we had around the same topic.
https://community.cloudflare.com/t/retrieve-ip-from-new-analytics-dashboard-to-block-suspicious-requests/212118/12

One method is to implement fail2ban on origin server which talks with Cloudflare Firewall via CF API. So you setup a fail2ban rule that inspects and counts number of 404 requests in your web server log and once it hits the threshold, fail2ban will log the offending IP ban and pass that onto Cloudflare Firewall to be blocked. Of course this has limitations in terms of number of IPs Cloudflare Firewall allows based on your CF plan so need need a short temp fail2ban ban time to purge from Cloudflare Firewall banned IP list and it ultimately relies on your origin web server to be able to handle the 404 requests still.

You could also do this combined with your idea to redirect 404 to a single page and you could have that page using CF cache everything page rule too.

But a properly optimised server/web stack should be able to handle such requests per minute, so might need to optimise that too. One thing you can do is setup guest based full HTML page caching on origin web server end too so 404’s hit your origin full HTML page caching via a custom 404 HTML page and not your web app directly.

thanks for the suggested solution, however, in this case Rate Limit would have solved it. But I dont want to block the flooding IP address since it belongs to Facebook.
If you see the first message, the attackers made Facebook flood our server, so the Rate Limit has a skip for Known Bots

Ah yes facebook so yeah blocking might not be the best but still valid mitigation strategy if it is overwhelming your origin server. Rate limiting via CF would incur costs though you can rate limit on origin server side too. But rate limits are on IPs so if there’s enough IPs in the attack, it will still overwhelm your origin. Especially if your origin can’t handle 100s of requests per minute. Still goes back to ultimately optimal origin server configuration to be able to handle the load.

One strategy you may want to try with Firewall Rules is to create a rule that will challenge any request coming from Facebook for which the URI Path is not in a list. Using the “is not in” operator, you can list all (if the site is small enough) or most of the site pages up to the rule size limit of 4KB. You can chose the pages not to be challenged by using your site analytics to rank the most visited pages.

Something like:

4 Likes

thanks for you suggestion, however, with a site of 5000 pages it’s impossible to list all of them…

Hi @yuvalgov !
I’m Cloudflare’s DDoS Protection product manager. Thanks for flagging this. We are investigating this and may reach out to Facebook on this matter. Can you please open a support ticket and DM me the ticket number?
Thanks!

8 Likes

How often does this happen (from Facebook)? Is your site being specifically targeted? How long do these attacks last? Do you think it’s the same perpetrator each time or a random attack using FB?

This is a manual process that requires watching the CF firewall, but what I would do is add a firewall rule to block access to the IP address AS Number (32934) for the duration of the attack (this will cover not just that one IP address, but any other FB IP address). Just watch the firewall rule page for when the attack traffic dies down and then disable the rule to allow legitimate access from FB again. You could probably code something on your server to automatically enable this Firewall rule via the CF API when this traffic is detected, and then set an automatic timer to disable the rule for the average duration of the attacks.

  • hundreds of requests per minute, server fails.

Also, as mentioned above, most web servers should be able to handle “hundreds of requests per minute” without failing. Maybe this needs to be investigated? What’s the actual rate of the attack? What web server are you using (Apache, NGINX, etc)? Is it a (shared) virtual server and can you increase the resource size of the server?

2 Likes

Hi Omer and biz1 , many thanks for your reply. Can we continue this on the support ticket here - [2081835]

Thanks

1 Like

Yup, thanks! Following up there.

This is a common problem.

What I did when faced with this problem, is replace the fancy php 404 page with a simple nginx redirect to a static html 404 page which I ensured was cached by CF.

Generally I wold remove any fancy scripted 404 error pages.

2 Likes

Good to know. That was the original poster’s solution as well.

This topic was automatically closed after 30 days. New replies are no longer allowed.