Eed Help Blocking Machine Learning Bots (High Volume Requests to Random Endpoints)

What is the name of the domain?

What is the issue you’re encountering

Machine Learning Bots (High Volume Requests to Random Endpoints)

What steps have you taken to resolve the issue?

Bot Management Settings:

Block AI Bots: Enabled
Definitely Automated: Managed Challenge
Likely Automated: Managed Challenge
Verified Bots: Allowed
JavaScript Detections: Enabled
Security Settings:

Static Resource Protection: Enabled
Cloudflare Managed Ruleset: Enabled
Cloudflare OWASP Core Ruleset: Enabled
Performance:

Optimize for WordPress: Enabled

What are the steps to reproduce the issue?

Hi Cloudflare Community,

I’m dealing with millions of bot requests flagged under Machine Learning detection — approximately 2.96 million, far surpassing verified bots or other sources (see screenshot below). These bots are aggressively targeting random or suspicious URLs and bypassing some of my current protections.

Even with these settings, I’m getting hit on paths like:

/.lsrecap/recaptcha
/search/
/randomstrings
/category/verylongstring
Deep paths like /a/b/c/d/e/f/g/… (over 10+ segments)
These generate 404 errors and appear to be non-human traffic — likely scrapers, spam bots, or malicious scanners.

:locked: Custom Rule I’m Using:
Here’s one of the rules I’ve tried to filter suspicious paths and endpoints:

plaintext
Copy
Edit
not (
http.request.uri.path matches “^/(example/post)/?$” or
(http.host eq “image.tmdb.org” and http.request.uri.path matches “^/t/p/w\d+/.+”)
) and (
http.request.uri.path matches “^/wp-includes/[^/]+/[^/]+\.php$” or
(http.request.uri.path matches “^/wp-admin/[^/]+\.php$” and http.request.uri.path ne “/wp-admin/admin-ajax.php”) or
http.request.uri.path matches “^/[a-z0-9]{6,}(/([a-z0-9]{6,}(-[a-z0-9]{6,})?)?)?/?$” or
http.request.uri.path matches “^/category/[a-z0-9]{10,}/?$” or
http.request.uri.path matches “(/[^/]+){10,}”
)
I’m hoping this helps filter deep fake URLs and unnecessary probes, but I’d love suggestions on how to improve it or whether to combine this with bot score-based logic.

:pushpin: What I Need Help With:
Suggestions on improving the above firewall rule
Best way to block machine-learning detected bots directly (e.g., using cf.bot_management.score)
Any reliable pattern-matching or rate limiting strategies for:
404 probes
random string URL hits
/search/ abuse
I want to reduce bot hits without affecting SEO or real users.

Thanks so much in advance!

It sounds like you have Bot Management. For those 404s, what do their bot scores look like? You can use Security Analytics to look at these requests for characteristics to block. Dash is here (I’m using the updated Security dash):

If you scroll down on in that log entry, you get more info, like IP address, etc. This is an example from a Verified Bot (Googlebot, I believe).

If you’ve got Bot Management, I’m hoping you also got Advanced Rate Limiting, so you can start triggering on high numbers of 404 responses for requests from a specific IP address:

You can also look at Bot Analytics to see where the weaknesses are, based on which automated requests are getting through to the origin:

In the above, it looks like 112 requests snuck through my WAF settings. In this case they were all from the same IP address, so I need to look into how they made past my my not-so-overbearing settings.

Also, if you’ve got the above, you should be eligible for Enterprise support to get more help with this. If you’re actually Yoast (Hi there!), you should be able to reach out to your account reps for more guidance on this. They probably have training sessions on how to fine tune your WAF.

This topic was automatically closed after 15 days. New replies are no longer allowed.