Bot /scraper attack bypassing (Under attack mode)

This appears to be some type of scraper which drops by several times an hour and parses every page on my site. By rotating through (what appears to be) a massive pool of good reputation IP’s from Verizon, Comcast, AT&T, etc, AND. . . By switching to a ‘’different’ IP for each page access, it can easily circumvent fail2ban, rate limiting and the Apache RBL’s.

So I figured I’d give Cloudflare a try with a free account. Evidently, this insidious bot appears to have no problem bypassing the CF (I’m under attack mode). The first thing that threw me off was that this bot continued to hit my pages, yet. . . The originating IP addresses were not appearing in Cloudflares firewall activity log. Hmmm

At second glance and by observing timestamps BETWEEN my server logs, AND CF’S firewall activity log, I could see that, in fact IP’s WERE hitting Cloudflare, but. . . They were NOT matching the IP’s in my server logs. Let’s look at a sample:

CF Firewall activity log:

Sep 13, 2023 2:24:04 AM
Managed Challenge
United States
108.85.157.166
Security level

But 108.85.157.166 is NOT what hit my site. Observe what DID hit my site, and what is appearing in my server log:

A visitor from pool-74-99-177-198.hrbgpa.fios.verizon.net (74.99.177.198)
arrived without a referring URL,
and visited http://mysite.com/about.html
at 02:24:08 AM on Wednesday, September 13, 2023.
This visitor used Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/113.0.0.0 Safari/537.36.

Well that’s interesting. . . It’s almost like the bot bounced off CF with 108.85.157.166 and landed on my site with 74.99.177.198, effectively bypassing the CF challenge. wtf?

I have no idea how it’s doing this. The only thing that popped up on google was something called an ISP proxy where, apparently, bot masters can now access huge pools of good reputation IP’s and route requests through them, even from a server. The other possibility is a botnet consisting of many compromised drone machines I suppose. I just don’t understand how it is bypassing cloudflare.

Full firewall log re: 108.85.157.166

Sep 13, 2023 2:24:04 AM
Managed Challenge
United States
108.85.157.166
Security level
Matched service
Export event JSON
Service
Security level
Action taken
Managed Challenge
Rule ID
riskyiuam_bot_score
Request details
Ray ID
805e4e1c483506f0
IP address
108.85.157.166
ASN
AS7018 ATT-INTERNET4
Country
United States
User agent
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/113.0.0.0 Safari/537.36
HTTP Version
HTTP/1.1
Method
GET
Host

Path
/index.html
Query string
Empty query string

Another thing. . . This bot never changes its user agent. It always appears as:

Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/113.0.0.0 Safari/537.36.

Are you still encountering this @nightclub2000?

Are you restoring original IPs? That would help identify what to block. Details here,

Have you created a firewall rule to block that user agent? Details here

Let us know if this helps.

Yes, created rule to block that agent and then it changed to AhrefsBot. As my original pose explained, the interaction consisted of two ip’s:

The first one was appearing the CF firewall log 108.85.157.166

The second one that made it through as seen from my server log 74.99.177.198

I don’t know exactly what they’re doing or how they’re doing it. One thing for certain is that whatever it is will be impervious to the majority of bot blocking methods. For now, (and until I can resolve this) I’m white listing refferers using mod_rewrite which has effectively halted these attacks. No need to mention the possible issues that might occur in doing this. It works for the moment, and the frequency of attacks are dropping considerably.

netstat runs on a nearby monitor and you can see it still trying through evidence of swarms of Comcast, ATT&T sbcglobal, spectrum and every other major ISP IP’s scrolling down the screen.

Yes, inbound requests are now translated some variant of 172.71. Not sure why that is now. Maybe NS changes had not fully propagated yet? Doubt it. Besides, what are the odds my site and the CF firewall log were seeing swarms of accesses simultaneously? Just dam strange. . . The CF firewall log is displaying the same rate of inbound IP’s and timestamps as my server logs, yet the IP’s are different between the two.

Nevertheless, it matters not. This is bot appears to be drawing from a pool of thousands, if not millions of American ISP IP addresses and again, all in good reputational standing.

No idea, even in theory how you’d block that, and especially when it’s able to to some funky trick using two ip’s to bypass CF filtering. Just perplexing.

Anyway. . . I’m fine for now. I have it under control. If you want me to help with testing or anything let me know and I’ll do whatever I can.

Much thanks,

Dave

This topic was automatically closed 15 days after the last reply. New replies are no longer allowed.