My web scraper is hitting Cloudflare bot protection - any ideas what to do?

Hello.

So I run a favicon fetcher service (icon [dot] horse). It’s mostly used by people developing apps or websites to grab an icon for a site they have a link to. For example, we have one client that runs a private knowledge base service and they use the icons to visually identify links. I get this data (often) from the site’s HTML, where <meta> tags contain the relevant info.

Lately, more and more of my scraper’s requests hit Cloudflare bot protection - and since no icons are present on that page, the result is blank, and my API serves a fallback image.

I was wondering what can be done about this situation - I realise that bot protection is a good thing and that its actually doing its job properly and detecting my scraper as a bot.

But I also would love to be able to get favicons.

So is there a way I can solve this issue? Some specifics:

  • I don’t make a request to the same domain often - I cache the result for some time
  • I currently display a user agent that identifies my bot, but I’ve also tried using a real browser user agent before
  • The IP of my service is static - it doesn’t change

Would love some further thoughts on this!

Hi there!

You can check here some possible workarounds:

​​What should I do if I am getting False positives caused by Bot Fight Mode (BFM) or Super Bot Fight Mode (SBFM)?

However, this:

The IP of my service is static - it doesn’t change

It could help you, because you can easily create an IP Access Rules to traffic blocked or challenged by BFM.

You can’t bypass or skip BFM/SBFM using Firewall Rules or Page Rules.

SBFM can be bypassed with IP access “Allow” action rules. BFM will be disabled if there are any IP access rules present.

Take care!

You may want to submit an application to have your bot added to Cloudflare’s known good bots directory. You can read about the requirements and how to apply here: Verified Bots Policy · Cloudflare bot solutions docs.

2 Likes

Thanks @albert I will try to do this.

Hi @AlphaK the issue isn’t so much that my bot is being challenged on my own site(s), it’s that the bot is being challenged on other people’s sites. I’m guessing I cannot set up IP access rules for other people’s sites. Sorry if I’m misunderstanding the answer.

1 Like

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.