Cloudflare blocks Hubstpot crawler even with WAF bypass

I want to/need to allow Hubspot crawler access to our website.

I set up a Firewall bypass rule to allow the crawler access to our domain, and I can see that the rule is correctly set up as the status is showing hits against it.

However the Hubspot crawler persistently reports a 403 error.

The only way that I can get rid of the 403 error is to bypass the Cloudflare proxy entirely in the Cloudflare DNS settings. I do not want to have to do this as that defeats the main purpose of us having a Cloudflare account.

Please help.

V.

Then remove that firewall rule and everything should get fine

There is no firewall rule that intentionally blocks the crawler. Which firewall rule are you suggesting that I remove?

I am talking about this rule

If you examine you Firewall logs you will see what is blocking their requests.

Some Cloudflare features like Super Bot Fight Mode do not give you any control, so you might need to disable that feature if the logs indicate it is responsible.

Logs would be great. Can you tell me where I can see the logs? I looked (even before posting here, and now) but cannot find them.

On the Firewall tab of the dashboard:

https://dash.cloudflare.com/?to=/:account/:zone/firewall

A filter like user-agent~contains=HubSpot should show you relevant traffic. I don’t use Hubspot for anything, and I have lots of requests from Amazon with user agents containing:

HubSpot-FeedFetcher
HubSpot Page Fetcher/1.0 http://www.hubspot.com/ [email protected]
Mozilla/5.0 (compatible; HubSpot Crawler; +https://www.hubspot.com)
HubSpot-Link-Resolver
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/27.0.1453.116 Safari/537.36 HubSpot Marketing Grader
HubSpot Connect 2.0 (http://dev.hubspot.com/) - BXredactedX

OK, that… the graphical event summary is what I see and used before to identify the issue.

I now see this:

I explicitly allowed the crawler through, but it seems to still be tripped up by the “manage definite bots” rule. Would this classify as a bug or is there something that I am missing that would truly allow just this crawler through?

That is Super Bot Fight Mode. You cannot use Firewall Rules to override that. You will need to disable SBFM if it is blocking legitimate traffic for you.

If you have a contact in Hubspot, you should ask them to apply to be a Verified Bot.

https://support.cloudflare.com/hc/en-us/articles/360035387431#h_5itGQRBabQ51RwT5cNJX8u

1 Like

Great, thank you for that. I now have a way forward.

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.