How to block all RSS Feeds and only allow Googlebot

Hello.

I was hoping someone from the Cloudflare support forums could kindly describe to me exactly how I can use WAF to block basically everyone from my RSS feeds and only allow services such as googlebot, yahoo, ect ect

Additionaly I woud like to ask. Is it possible for someone (im not sure if this the right word for it) spoof Google bots RSS feed crwalers to make it look like they are infact something like ;

IP: 66.249.84.150 Hostname: google-proxy-66-249-84-150.google.com

FeedFetcher-Google; (+http://www.google.com/feedfetcher.html)

From what I can tell from this document I would need to block all and allow only

Feedfetcher-Google

Google Feedfetcher | Google Search Central | Documentation | Google Developers

But this is not a valid “Host Name” or IP

Could someone tell me if the below is a fake googlebot?

Activity Detail

Romford, United Kingdom visited (MY FEED URL)
8/4/2022 11:06:28 AM (2 hours 14 mins ago)

IP: 66.249.84.150 Hostname: google-proxy-66-249-84-150.google.com

FeedFetcher-Google; (+http://www.google.com/feedfetcher.html)

What have you tried so far when it comes to the firewall rule?

Hi

I have just learnt how to do a reverse IP lookup to verifiy Google bots.

My issue is. I have someone scraping my wordpress content, and when it goes live on thier site my content gets demoted and they get ranked in my spot because they have domain name that closely matches mine. This looks like to me an obvious google algo canonical issue.

They are getting my content from RSS. I cannot stop my rss because I need it for my google news and top stories.

I have been going through my live local wordfence firewall logs to try and identify the IPs that are crawling my RSS urls which I have done,

So I basically wanted to block all from my RSS feed urls (i have several) and only allow Googlebot, yahoo. I know how to block URLs in WAF, but not sure how to actually allow only Google and Yahoo’s crawlers onto my RSS feed URLs

I’d start at Rules language · Cloudflare Ruleset Engine docs. That should have all the necessary information to get you started.

Hi.

Thaks for the info,

I was hoping someone cloud save me a bunch of time and show me

Block Full URL - mysite/feed

Allow
Google bot
Bing bot

Very confusing with the different rule sets such as

Contains
Equals

And with the allow rule for google and bing. I can’t seem to find anywhere what the actual address names of these are to put in for these bots and crawlers specifically

Well, the forum is for assistance, not necessarily to do it for you :slight_smile:

That’s why my question what you have tried so far. If you can post that, we could use this as a start. Post a screenshot of the rule.

Hi.

Thaks for the info,

Would this rule work?

URL FULL mysite/feeds
If
Referer does not equal - google.com, bing.com

Block

?

Not really, first of all I would not use the full URL, but only the path.

And also, the referrer is not the right field as that will only tell you if the user clicked on a link, that’s not what you want. You want to not block search engines.

I would recommend to check out the documentation once more, in particular the known bots field.

Hi.

Thanks

So I am trying to do this

If full URL contains mysite .com/feeds

And

Known bots - does not =

But this does not allow me to do it. It greys out the (does not equal)

Why still full URL?

So then. Which one is it then?

URL path?

And why does the - does not equal get greyed oiut?

Sure, but you are pretty close. This should actually do exactly what you want.

Great

That was the helpful answer I was hoping someone could point out for me.

I was going over the docs as you recommended. However, for a total begginer I found some things slightly confusing.

I now understand that selecting known bots will grey out the box. And make sure this is turned off (not green) to allow the known bots

Thank you

Things are always confusing, but that’s what the documentation is for. The more you read the documentation the less confusing it is.

#tutorial has lots of it as well.

The more I read it. The more confusing it became for me in my case :slight_smile:

But thank you so much for the direct instructions. It helped alot

Hi.

I found this (I think it’s called a user agent) (BOT) crawling my site that I dont want.

I have been blocking the IPs manually but they keep coming with a new IP every time a block the last one.

I think this is some kind of media or PR news finder/scraper that is causing some of my my issues in regards to content I dont want people reposting.

The following is the details from my local logs. I did not find this bot “Muckrack” in your verified bot lists. So I want to block it

Activity Detail

[Cedar Knolls, New Jersey,
q=40.8228989,-74.4591980&z=6) visited https://-mywebsiteurl

8/5/2022 12:49:14 AM (19 minutes ago)

Type - bot

IP: 69.164.211.11 Hostname: 69-164-211-11.ip.linodeusercontent.com

Mozilla/5.0 (compatible; MuckRack/1.0; +https://muckrack.com)

I understand I can block user agents. However, I am not quite sure what I should put in as the useragent name. In this case. Is this just called “MuckRack” ?

Have I done this correctly?

The other thing is. I have bot fight mode turned on. And this bot is not in your verified list. But it still comes to my site

I am also seeing other bots coming to my site not in the verified list

I have def automated to ON
I have verified bots to ALLOW

Not sure why I am still seeing unverified bots not listed in your page here.
Verified Bots | Cloudflare Radar

Another example from logs still coming to my site thats not in the list

8/5/2022 1:50:55 AM (6 minutes ago)

IP: 95.163.255.96 Hostname: fetcher10-5.go.mail.ru

Mozilla/5.0 (compatible; Linux x86_64; Mail.RU_Bot/2.0; +https://help.mail.ru/webmaster/indexing/robots)

Yes, that rule should work, though you should probably use equals.

There’d be also https://support.cloudflare.com/hc/en-us/articles/115001856951-Understanding-Cloudflare-User-Agent-Blocking

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.