Can I block access to our website(`saltwire.com`) from `12ft.io`?

Answer these questions to help the Community help you with Security questions.

What is the domain name?
saltwire.com

Have you searched for an answer?
Yes

Please share your search results url:
None

When you tested your domain using the [Cloudflare Diagnostic Center], what were the results?
Just need to block access from 12ft.io

Describe the issue you are having:
12ft.io disables our paywall and get unlimited access to our sites. I am looking for a way to block access from 12ft.io.

How does it work?

"
The idea is pretty simple, news sites want Google to index their content so it shows up in search results. So they don’t show a paywall to the Google crawler. We benefit from this because the Google crawler will cache a copy of the site every time it crawls it.
All we do is show you that cached, unpaywalled version of the page.
"

What error message or number are you receiving?
No error message

What steps have you taken to resolve the issue?

  1. Found 12ft.io IPs and added those IPs to IP access list
  2. Unable to block them with Cloudflare WAF

Was the site working with SSL prior to adding it to Cloudflare?
N/A

What are the steps to reproduce the error:

  1. Visit 12ft.io
  2. Add https://www.saltwire.com/halifax and click the button “remove paywall”
  3. It just removes our paywall

Have you tried from another browser and/or incognito mode?
Same
Please attach a screenshot of the error:
No error

One of the most challenging thing to do on the internet is to prevent a bot from crawling your content. The internet is a public space, and as long as the content is publicly available, bots will find a way to grab it.

The IPs from the infringing site may be their public facing IPs, but not necessarily the IPs their crawler uses. Also, user agents can be easily spoofed. Having said that, you could try a WAF Custom Rule with those fields and see if that blocks them from visiting your domain. Using the WAF UI it should be pretty streighforward.

When requests match 
Source IP is in .x.x.x.x y.y.y.y etc.
OR 
User Agent contains "pattern"
Action: Block

If they in fact only save what Googlebot caches, another approach you can take is instruct Googlebot not to cache your content (which is not the same as asking it not to index it) with a noarchive directive on the X-Robots-Tag header.

2 Likes

This topic was automatically closed 15 days after the last reply. New replies are no longer allowed.