Is it possible to block only cache misses? This particular use case is related to the overzealous crawling of Bingbot (verified IPs, so not a spoof/bad bot). They aren’t returning enough search engine value for my site to incur the cost of hitting my missed cache data charges from AWS, so I don’t mind if it crawls and indexes cache hits, but I want to block it outright from cache misses. I figure that is better than blocking it entirely (which is how I have it set now with a hard block on user agent).
FWIW, put a WAF firewall rule on user agent AND hostname (CDN hostname) to block any requests going there directly. Not optimal, but Bing can crawl the site and index without the media. Traffic is minimal from Bing, so the value lost even if it doesn’t index well is trivial IMHO.
This topic was automatically closed 15 days after the last reply. New replies are no longer allowed.