Browser integrity check doesn't do anything with curl?

Hello all,

trying to use Browser integrity check to stop a bot that scrape our pages, I have enabled the browser integrity check on the page pattern, but now If I try to curl the page I still get the complete html.
And the bot still continue to access the pages. (I block him by IP for now, but I let him go through by disabling the block to check the browser integrity check fix)

Hi @louise1,

From this feature’s definition we have:

Cloudflare’s Browser Integrity Check (BIC) is similar to Bad Behavior and looks for common HTTP headers abused most commonly by spammers and denies access to your page. It will also challenge visitors that do not have a user agent or a non standard user agent (also commonly used by abuse bots, crawlers or visitors).


What does the Browser Integrity Check do?

Perhaps in your case it’s best to deal with network (IP, ASN, Country) and/or User-Agents blocking.

1 Like

the sad truth is: there is no real easy way of stooping scrapers, you can put a rate limiting and other tricks like hidden urls that only scrapers will access and block every ip who visit it(except good bots)

4 Likes

Thank you for your reply @dmz , I read that documentation and thought that if I tried to curl the page url, it should return me a Cloudflare blocked page, but I’m still able to scrape the html :open_mouth:

Yes @boynet2 that’s real sad indeed, thanks for the suggestion though. In our case I think the bot is configured to scrape our images so the hidden link trick won’t do it, maybe the rate limiting tho :thinking:
Does Cloudflare provide this kind of features?

I don’t believe Browser Integrity Check will be able to handle the problem you are describing, so I mentioned the firewall and user-agent rules. Have you tested any of these features?

If you can provide more information about this bot behavior, maybe we can help you better:

  • Does the User Agent vary?
  • Is the IP/Country always the same?

Cloudflare offers a Rate Limiting solution, however if you can deal with the issue using the features I mentioned, you’ll save some money (since Rate Limiting charges are based on usage).

1 Like

I’m able to block the bot by IP and so far it worked.
The user-agent used was a “legit” Mozilla browser user agent.
“Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:66.0) Gecko/20100101 Firefox/66.0”
so it does a decent job as trying to look like a legit request.

The bot hit our images page, following an URL pattern /image/id

after we block his IP, since few hours now, he changed his behavior and tried to hit our image CDN directly using a python user agent… But still using the same ip and the CDN is also behing Cloudflare so he is sill blocked.

However since it look like the bot is custom and target specifically our website I think it’s a matter of time before the hacker use a serverless architecture.

I’ve looked into the rate limiting and I think it will be the solution if it comes to that…

Do you know if we can limit that feature to a specific url pattern?

Thanks again @dmz for your help and advices.

1 Like

Fortunately, yes! Take a look at this support page:

It links to all others related to Rate Limiting and will certainly clarify many issues on the subject.

You’re welcome! It’s a pleasure to be able to help. :slight_smile:

3 Likes

This topic was automatically closed after 30 days. New replies are no longer allowed.