Looking for solution to block GPT module scrapers

What is the name of the domain?

example.com

What is the issue you’re encountering

Can’t stop scraper with WAF

What steps have you taken to resolve the issue?

Robots.txt: disallowed gpt and other bots
WAF: blocked bunch of spiders, scrapers, IPs
Blocked access to sitemap
Tried rate limiting, but it is useless for period 10s when somebody just use link to scrape one post.
When I test scraper I can not see Agent-User or request IP in my server log so I can’t block it based on this.
WAF is not blocking referal when I tested it.
While even NewYorkTimes is not blocking it, this site Angi(dot)com is successfully blocking scraper and openai python script too. I am getting The error, net::ERR_HTTP2_PROTOCOL_ERROR. I am looking for solution how to implement this to wordpress+cloudflare so it will block ChatGPT module scrapers like Webpilot and others. I think it has something to do with headers and origins. What are the cloudflare options for this please?

Was the site working with SSL prior to adding it to Cloudflare?

Yes

What is the current SSL/TLS setting?

Full

Screenshot of the error

Have you tried this?

1 Like

Yes, there is just a few blocked scrapers. I block all of them already and like 30 others. I think this is the solution net::ERR_HTTP2_PROTOCOL_ERROR. I need to somehow get this as a result :))

Looks like I found a solution for now with honeypot html link Trap Link and WAF to block whatever will try to access it.

And I directly asked scrapers to try to scrape this link. WAF showed blocked data and I blocked IPs…

This topic was automatically closed 2 days after the last reply. New replies are no longer allowed.