Crawler access to RSS feeds

I develop Unread, an RSS reader.

My feed crawlers are having trouble retrieving some feeds.The forum doesn’t seem to want to met me post links, but one that does not work is one at fordauthority-dot-com-slash-feed-slash.

They get a 403 forbidden page from Cloudflare. It looks like Cloudflare is blocking my crawlers because they detect my crawlers to be “bots”. And they are – RSS feeds are intended for bots.

Is there any way to get my crawlers put on some kind of “allow” list?

John

That’s covered here:

https://support.cloudflare.com/hc/en-us/articles/360035387431#h_5itGQRBabQ51RwT5cNJX8u

2 Likes

I’d just like to add, unfortunately not so as far as I know a lot of my colleagues reading RSS feeds or having some RSS app to read news and other stuff they’re interested, even subscribed to get them into the MS Outlook and similar app :slight_smile:

Cloudflare provides a set of tools for website operators that allow them to manage their Internet properties. Cloudflare does not unilaterally decide to block any traffic except for DDOS. Everything else requires configuration by the website operator.

One of the managed features is a list of Verified Bots, and website owners can use that to permit well behaved automated bots that might otherwise be blocked by other rules and/or features. The link above contains details on how you can apply to become a verified bot.

Really? The intended targets for my RSS feeds are users. The majority of bots that I see accessing my RSS feeds are badly behaved content scrapers, many trying to pass off my content as their own, with their own monetisation strategy, and with no attribution.

Does your user agent include a URL that describes your bot?

1 Like

Really? The intended targets for my RSS feeds are users. The majority of bots that I see accessing my RSS feeds are badly behaved content scrapers, many trying to pass off my content as their own, with their own monetisation strategy, and with no attribution.

We might be splitting hairs over a definition of “bot”. Yes, the end user of an RSS reader is a human reading an article. But content is scraped by a process that is something other than a user opening a webpage with Safari or Chrome.

Not really, an RSS reader is a client as just any regular browser. It’s not automated but used by a human.

2 Likes

This topic was automatically closed 15 days after the last reply. New replies are no longer allowed.