A website named JustTheRecipe.com is scraping my content and stealing it. Server logs look like this:
www.christmas-cookies.com 126.96.36.199 - - [05/Dec/2022:11:05:59 -0500] “GET /recipes/treats-for-animals/peanut-butter-dog-treats/ HTTP/2.0” 200 33783 “https://www.justtherecipe.com/” “Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/188.8.131.52 Safari/537.36” “xxx.xxx.59.249” “cache:BYPASS”
I have created a firewall rule to block hostname “justtherecipe.com” like so (I also added their parent companies just in case, for the future:
(http.host contains “justtherecipe”) or (http.host contains “loopedin”) or (http.host contains “streamline”)
In the log above, “xxx.xxx.59.249” is my personal IP address. That was me testing JustTheRecipe to see if it would scrape Peanut Butter Dog Treats. It did. JustTheRecipe scraped the content, but it’s showing my IP address so I am a little confused by that.
How can I configure the firewall to block this service?
Answer these questions to help the Community help you with Security questions.
What is the domain name?
Have you searched for an answer?
Please share your search results url:
I implemented this workaround from 2021 but it is not working as described above
When you tested your domain using the Cloudflare Diagnostic Center, what were the results?
Describe the issue you are having:
What error message or number are you receiving?
The website continues to scrape the content regardless of the firewall rules implemented
What steps have you taken to resolve the issue?
- See above
Was the site working with SSL prior to adding it to Cloudflare?
What are the steps to reproduce the error:
- Go to JustTheRecipe.com
- Input any URL from my website christmas-cookies.com.
- You will see it scraped my content
Have you tried from another browser and/or incognito mode?
Please attach a screenshot of the error:
Screenshot attached of Justtherecipe.com having scraped the content of this page: