I’m actually getting the same bot. They also appear as “fidget-spinner-bot” and they “crawl” the site with millisecond frequency which puts my server at 100% CPU and which is why I started migrating to Cloudflare because my Nginx couldn’t handle it.
To whom ever this bot belongs please use a lower frequency you’re downing these sites and frankly it looks like DDoS and now scraping. Particularly because always changing IPs are being used and the user agent is being adjusted.
I observed the same IP addresses which used two user agents:
There was a bunch of IP addresses that used “my-tiny-bot” and then suddenly they all started using “thesis-research-bot”.
By blocking these two user agents the frequency has greatly decreased.
I put the rule that I mentioned in the previous messages in Cloudflare in the WAF section but it does not mitigate the process in its entirety, it keeps coming in. I’m going to try today also setting the APACHE restriction as the other colleague said.
In 3 hours I’ll comment on how it went.
If you have CF in front, make sure your origin server doesn’t have the site as default on the IP address being proxied to, or the bot may just be hitting the server directly from cached (intentionally or otherwise) DNS. That could help you avoid a cpu wasting htaccess or web server rule and just let CF filter the bot.
I was able to mitigate the issue by adding a custom WAF rule after retrieving the bot’s IP source addresses from the Live Traffic section of Wordfence on my WooCommerce site. Managed to block almost 3,000 attempted requests in less than 2 hours.
(ip.src eq 220.127.116.11) or (ip.src eq 18.104.22.168 ) or (ip.src eq 22.214.171.124 )
For anyone blocking the user agent in WAF rules keep in mind that WAF rules are case sensitive thus a rule to block Thesis-research-bot would not match thesis-research-bot I made this mistake when typing up the rule and thought WAF or dashboard was broken after Cloudflares incident last week. the rule work when I copied the user-agent from analytics page to the waf rule
I think this approach works near term but I would expect these IPs to change over time and there’s no easy way to keep up with that. I’m pretty surprised that CF’s core feature set has handled this so poorly.
Btw, the same is true for the user-agent string too so neither of this strategies will be very effective at thwarting these bots long term.