403 error on ads.txt with custom user agent

We’re having an issue with our ads.txt file, where one of our providers tries to crawl the file with the user agent “AppNexusAdsTxtCrawler/1.0”, but gets a 404 error. It works if the User Agent is set to GoogleBot, however.

A curl example that throws the error:

$ curl -L -A “AppNexusAdsTxtCrawler/1.0” --ssl-no-revoke https://soapsspoilers.com/ads.txt

I’ve added a firewall rule which matches the URI path OR the User Agent, and set it to bypass all User Agent Blocking, Browser Integrity Check, Hotlink Protection, Security Level, Rate Limiting, Zone Lockdown, and WAF Managed rules, but it still doesn’t work. The expression is: (http.request.uri.path eq “/ads.txt”) or (http.user_agent eq “AppNexusAdsTxtCrawler/1.0”)

I’ve tried disabling all WAF rules, set the “Super Bot Fight Mode” to “Allow” and basically turned off anything security-related, set DDoS ruleset sensitivity to “Essentially Off”, I have no User Agent Blocking or Zone Lockdown settings specified…

How can I get this to not return a 403 error?

The full error is:

$ curl -L -A "AppNexusAdsTxtCrawler/1.0" --ssl-no-revoke https://soapsspoilers.com/ads.txt
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>403 Forbidden</title>
</head><body>
<h1>Forbidden</h1>
<p>You don't have permission to access this resource.</p>
<script defer src="https://static.cloudflareinsights.com/beacon.min.js/v652eace1692a40cfa3763df669d7439c1639079717194" integrity="sha512-Gi7xpJR8tSkrpF7aordPZQlW2DLtzUlZcumS8dMQjwDHEnw9I7ZLyiOj/6tZStRBGtGgN6ceN6cMH8z7etPGlw==" data-cf-beacon='{"rayId":"70cdd1543a5a542b","token":"16916d7d14534c3f8b0e18bda0c05511","version":"2021.12.0","si":100}' crossorigin="anonymous"></script>
</body></html>

This isn’t a Cloudflare 403 page which implies it’s being served by your origin server itself.

Especially since absolutely any other User-Agent works, including fjwaoigjag

1 Like

Thanks KianNH, I’m checking that now. I didn’t see the error in the server logs, but I’m double-checking!

You’re absolutely right, there was a misconfiguration in the .htaccess file which was blocking “bad” user agents. Thanks!!!

1 Like

This topic was automatically closed 15 days after the last reply. New replies are no longer allowed.