Getting Tons of Bad bots attack recently

thesis research bot
what is this bot?
And the ip is even from cloudflare?
Is someone attacking my site?
My site has been 100% cpu and cracked for like nearly 7 hours.
Below are some logs:

172.71.147.169 - - [07/Nov/2023:22:36:36 +0800] "GET /collections/leather-bags?filter_product_tag=1081,1068,1079&filtering=1 HTTP/2.0" 499 0 "-" "thesis-research-bot"
172.71.146.214 - - [07/Nov/2023:22:36:36 +0800] "GET /collections/leather-bags?filter_product_tag=1081,1068,1065&filtering=1 HTTP/2.0" 499 0 "-" "thesis-research-bot"
172.71.150.91 - - [07/Nov/2023:22:36:37 +0800] "GET /collections/leather-bags?filter_product_tag=1081,1068,1066&filtering=1 HTTP/2.0" 499 0 "-" "thesis-research-bot"
172.71.142.57 - - [07/Nov/2023:22:36:37 +0800] "GET /collections/leather-bags?filter_product_tag=1081,1068,1080&filtering=1 HTTP/2.0" 499 0 "-" "thesis-research-bot"
172.71.147.93 - - [07/Nov/2023:22:36:38 +0800] "GET /collections/leather-bags?filter_product_tag=1081,1068,1082&filtering=1 HTTP/2.0" 499 0 "-" "thesis-research-bot"
172.71.150.220 - - [07/Nov/2023:22:36:38 +0800] "GET /collections/leather-bags?filter_product_tag=1081,1068,1067&filtering=1 HTTP/2.0" 499 0 "-" "thesis-research-bot"
172.71.150.197 - - [07/Nov/2023:22:36:39 +0800] "GET /collections/leather-bags?filter_product_tag=1081,1068,1236&filtering=1 HTTP/2.0" 499 0 "-" "thesis-research-bot"
172.71.150.64 - - [07/Nov/2023:22:36:39 +0800] "GET /collections/leather-bags?filter_product_tag=1081,1068,1069&filtering=1 HTTP/2.0" 499 0 "-" "thesis-research-bot"
1 Like

Are you restoring visitor IPs?

Just add a WAF rule to block them by user agent or URL.

2 Likes

That bot is also visiting me.
I have tried to put a rule in the WAF section, with the following rule and it does not work:

(http.user_agent contains "thesis-research-bot")

For NGINX:
nano /etc/nginx/sites-available/yoursite.com/configfile.conf

add in server {…} context:
if ($http_user_agent ~* "thesis-research-bot"){ return 403; }

must be:

server {
#your server configs here
if ($http_user_agent ~* "thesis-research-bot"){ return 403; }
#...
#your server configs here
}

next:

  • nginx -t
  • sudo systemctl restart nginx

if apache: add to end of file, which is situated in root directory .htaccess file this rows:

SetEnvIfNoCase User-Agent "thesis-research-bot" bot
Deny from env=bot

I’m seeing the same bot, hosted with AWS, attacking sites endlessly; it seems to focus on ecommerce sites.

1 Like

I’m actually getting the same bot. They also appear as “fidget-spinner-bot” and they “crawl” the site with millisecond frequency which puts my server at 100% CPU and which is why I started migrating to Cloudflare because my Nginx couldn’t handle it.

To whom ever this bot belongs please use a lower frequency you’re downing these sites and frankly it looks like DDoS and now scraping. Particularly because always changing IPs are being used and the user agent is being adjusted.

I observed the same IP addresses which used two user agents:

  • thesis-research-bot
  • my-tiny-bot
    There was a bunch of IP addresses that used “my-tiny-bot” and then suddenly they all started using “thesis-research-bot”.
    By blocking these two user agents the frequency has greatly decreased.

Exactly the same for me. It even affects my ecommerce website I am losing money! So annoying.

You can block them with htaccess rules + firewall (iptables fail2ban).

I mean is there a way to set up a setting via Cloudflare to block such bots?

I put the rule that I mentioned in the previous messages in Cloudflare in the WAF section but it does not mitigate the process in its entirety, it keeps coming in. I’m going to try today also setting the APACHE restriction as the other colleague said.
In 3 hours I’ll comment on how it went.

If you have CF in front, make sure your origin server doesn’t have the site as default on the IP address being proxied to, or the bot may just be hitting the server directly from cached (intentionally or otherwise) DNS. That could help you avoid a cpu wasting htaccess or web server rule and just let CF filter the bot.

Hello,
I had the same problem on a newspaper website. It was solved by the solution proposed in message #6.
Thank you for your help.

I was able to mitigate the issue by adding a custom WAF rule after retrieving the bot’s IP source addresses from the Live Traffic section of Wordfence on my WooCommerce site. Managed to block almost 3,000 attempted requests in less than 2 hours.

(ip.src eq 100.21.24.205) or (ip.src eq 44.230.252.91 ) or (ip.src eq 52.25.208.208 )

Also seeing this on news sites, there are at least four:

  1. thesis-research-bot
  2. my-tiny-bot
  3. test-bot
  4. fidget-spinner-bot

Zones that had Super Bot Fight Mode with Managed Challenge for Definitely automated traffic still let those 4 bad bots through.

For anyone blocking the user agent in WAF rules keep in mind that WAF rules are case sensitive thus a rule to block Thesis-research-bot would not match thesis-research-bot I made this mistake when typing up the rule and thought WAF or dashboard was broken after Cloudflares incident last week. the rule work when I copied the user-agent from analytics page to the waf rule

2 Likes

I think this approach works near term but I would expect these IPs to change over time and there’s no easy way to keep up with that. I’m pretty surprised that CF’s core feature set has handled this so poorly.

Btw, the same is true for the user-agent string too so neither of this strategies will be very effective at thwarting these bots long term.

1 Like

Yes, indeed; unfortunately, it is not a “Set it and forget it” approach, so we need to continue monitoring our live traffic occasionally.

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.