WAF rule not blocking Bytespider?

TikTok’s parent company, Bytedance, has an aggressive crawler called Bytespider that is hitting my server every second.

I attempted to deploy the following WAF rule:

But it doesn’t seem to be doing anything:

Example of the spider hitting my server after the WAF rule was deployed:

47.128.60.232 - - [26/Feb/2024:08:12:22 +0000] "GET /img/present.jpg HTTP/2.0" 200 2760 "https://mywebsite.com/url" "Mozilla/5.0 (Linux; Android 5.0) AppleWebKit/537.36 (KHTML, like Gecko) Mobile Safari/537.36 (compatible; Bytespider; [email protected])"
47.128.124.250 - - [26/Feb/2024:08:12:22 +0000] "GET /img/zodiac.jpg HTTP/2.0" 200 3641 "https://mywebsite.com/url2" "Mozilla/5.0 (Linux; Android 5.0) AppleWebKit/537.36 (KHTML, like Gecko) Mobile Safari/537.36 (compatible; Bytespider; [email protected])"
47.128.60.225 - - [26/Feb/2024:08:12:22 +0000] "GET /img/music.jpg HTTP/2.0" 200 1808 "https://mywebsite.com/url3" "Mozilla/5.0 (Linux; Android 5.0) AppleWebKit/537.36 (KHTML, like Gecko) Mobile Safari/537.36 (compatible; Bytespider; [email protected])"

Any idea what might be happening here?

Hi,

Are you confident that those requests you are seeing in your logs are going through Cloudflare?

Do you firewall your origin off from the internet and only allow Cloudflare IPs? - Cloudflare IP addresses · Cloudflare Fundamentals docs

My thinking is that the requests are bypassing Cloudflare and going directly to your origin server.

If you do this and your still seeing these requests, you can add the cf-ray header to your origin access logs - Cloudflare HTTP request headers · Cloudflare Fundamentals docs

This will help you confirm that the request is definitely passing through Cloudflare and the rayID associated with the request.

Another thing to check, do you have any IP access rules that could be allowing a country/network through? - IP Access rules · Cloudflare Web Application Firewall (WAF) docs

In the order of execution, IP Access rules trigger before Custom rules, so if you have an ‘allow’ rule allowing traffic that could be preventing custom rules from triggering.

Hope this helps!

4 Likes

Massive doh moment on my part. Added the rayID to my access logs. Noticed that none of the requests contained it.

Turns out, I had the proxy setting turned off. I can’t remember doing so, but it may have happened when I was updating an SSL cert a while back. Thank you very much. It took a few minutes, but the spider is being blocked now.

This topic was automatically closed 2 days after the last reply. New replies are no longer allowed.