Huge traffic from Bots

I have 3 Wordpress on a VPS and lately I’ve been noticing CPU throttling. At first I thought it was from the VPS, but when I looked at the Cloudflare firewall events I saw 500k bot hits (for each page) in the last 24 hours.
I had the option in the Cloudflare Firewall “Skip Known Bots” but still the CPU was extremely high …
I turned off “Known Bots” and put only the categories below. Again the CPU was extremely high…
I block all Bots, then the VPS sleeps… zero traffic.
What am I doing; (I’m on Cloudflare’s free plan)


After a couple of hours, I think the problem is with AS32934 Facebook.
Is it a legal or a fake Facebook’s AS ?
I have enable a rule to block this AS32934, then the CPU is low.

Have you tried enabling Bot Fight Mode?

Yes, I had it enable, but AS32934 is overloading my server.
The only way is to block AS32934

I have enable “rate limiting rule” and “bot fight mode”


Hello @erisa-cf , any help?

Hello,
this bot attacked me and the only way to stop it, is to block it.

I tried to limit from WAF/ Rate limiting rules , but doesn’t help.

Any help please ?

Why don’t you block it by ASN? AS32934

I did it, but Facebook can’t take pictures from the websites.

Okay so don’t block it. Are you experiencing any negative effects of the amount of traffic coming from Facebook? If not then I wouldn’t worry about it

Yes, as I told, I have problem with my CPU server. It is like DDOS attack.

This topic was automatically closed 15 days after the last reply. New replies are no longer allowed.

Hello,
this bot attacked me and the only way to stop it, is to block it.

I tried to limit from WAF/ Rate limiting rules , but doesn’t help.

002

If i block AS32934 Facebook can’t take pictures from the website.

@Omer or someone for support can help me please ?

I have merged your three threads in to one, and re-opened the oldest thread, as it appears they all are about the Facebook bot, and the apparent overload that it causes on your server.

As it seems like you found out here:

and

If you are unable to block the traffic for some reason, such as e.g. Facebook that then can’t load images from your website, you’re looking for some way to selective block what needs to be blocked, and allow what needs to be allowed.

One way could for example be to block Facebook’s AS32934, unless the file requested is an image file.

However, doing so would likely block Facebook from seeing e.g. the page title / preview of your website.

Depending on how you serve your images, and whether it fits together with the Cloudflare ToS, you could also look at caching the images more aggressively.

This crawler hits the images…
Check the screenshot bellow:

Any idea how to stop the crawler’s attack?

Seems mostly like images, however, there is one that doesn’t seem to be an image, e.g. the third from the bottom.

I’m wondering if the VPS that you indicated above that you use, is actually too “low end” to cope with the traffic that your websites are receiving, and that it would be the major reason to the issues you face.

However, -

Assuming that your website is about a Greek radio station:

When I tried loading the second image from the bottom of your list, e.g. “/wp-content/uploads/2022/10/24s4lesb10.jpg”, I had to hit your image several times from my end, before I received the “CF-Cache-Status” header with “HIT”, and this image contains a “Last-Modified” header indicate that the image file was last updated on Sat, 01 Oct 2022 10:31:01 GMT, which matches the time (/2022/10/) from the path of the file.

When I loaded your front page, I saw 128 requests, where many of them were image files. Although the majority seemed to retrieve the “CF-Cache-Status” header with “HIT”, we had another one called “/wp-content/uploads/2023/12/velopoulos_grafeia_07_12.jpg”, which retrieved a “CF-Cache-Status” of “REVALIDATED”.

All your images seems to have a Cache-Control header, but shared for all of them, are that they are indicating that it should be cached only for 14400 seconds (4 hours).

There are however images being loaded, where the “Last-Modified” header indicate that the image file was last updated in November, such as e.g. /wp-content/uploads/2023/10/3c5140b26e.jpg that was updated on Wed, 22 Nov 2023 23:17:25 GMT, but still only contains that Cache-Control header, that is requesting it to be cached only for 14400 seconds (4 hours).

So I would possibly move on with what I said above:

As mentioned above, having to hit the image multiple times to retrieve “CF-Cache-Status” header with “HIT”, tells me that e.g. Tiered Cache is turned off for your zone.

https://dash.cloudflare.com/?to=/:account/:zone/caching/tiered-cache

If you’re enabling Tiered Cache, that alone should be able to reduce the amount of queries that the the Cloudflare network ends up on requesting from your server.

The reduction of requests will however be limited to the Cache-Control header, that you have set, so for the file mentioned above that was updated on Sat, 01 Oct 2022 10:31:01 GMT, Cloudflare would still be requesting (e.g. refreshing the cache, seeing whether it is still valid) after the time of 14400 seconds (4 hours), or if you have traffic on the site 24/7, then a roughly 6 times per day.

If your visitors end up on reaching Cloudflares locations in Hong Kong, São Paulo, and Amsterdam, it would be be 6 times per day for each of these locations, meaning 18 times per day, in this example with, where Tiered Cache is disabled. Enabling it could reduce this to the above mentioned 6 times per day.

Imagine 200 Cloudflare locations worldwide, all having to request your image, because your visitors are reaching 200 different Cloudflare locations. Now, you would have 200 * 6, or 1200 requests for the same image file per day, assuming that we’re looking at the worst scenario possible.

As the first mentioned file above hasn’t been updated in more than a year, I would say that is quite a static file, so upgrading the Cache-Control header to something like 86400 seconds (1 day), 604800 seconds (1 week), 2592000 seconds (30 days), or even 31536000 seconds (365 days / 1 year), would likely help a lot to reduce the queries (and thereby the load) that your VPS receives.

Maybe there is even a chance, that you could avoid blocking traffic at all, solely by looking at improving your cache settings.

2 Likes

Hello @DarkDeviL DarkDeviL and thank you for your reply.

I had set the following:


The VPS is not a low end host machine.

If I want to skip WAF for only “known bots” , but I want to have enable Rate limiting rules (for bots) options and Bot Fight Mode ON, what have to check bellow ?

Any other tips are welcome!
Thank you

Facebook…