When I go into the Bing Webmaster tools UI, and attempt to submit our sitemap, I am getting a 403 error. I went through several troubleshooting steps including modifying our robots.txt file, creating a Cloudflare firewall rule to allow known bots. For the Super Bot Fight mode, I have settings set to “allow” and have not enabled static resource protection nor JavaScript detections. It wasn’t until I disabled our WAF geo rule where Bing was able to access the sitemap. This is our rule blocking IPs outside of North America: (ip.geoip.country ne “US” and ip.geoip.country ne “CA” and ip.geoip.country ne “MX” and ip.geoip.country ne “PR”) This is a recent problem (within the past 8 or 9 days).
Once I allowed all IPs regardless of geo, Bingbot was able to access our sitemap. I would like to only allow North America IP address access, but am not sure how to do that and still be able to allow Bingbot to access my sitemaps. I did try utilizing the “known bots” rule however (and a bit comical), it blocked the Cloudflare site health check looking for any page status not equal to 2xx or 3xx.
the Bingbot user agent I’m used to seeing is Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm) but you might want to just do a “contains bingbot”
Do keep in mind that user agents can be easily spoofed so foreign IPs could potentially sneak in this way, I see people impersonating Googlebot occasionally (not too often) but I’ve never noticed anyone impersonating Bing.
Well, you specifically told Cloudflare to block everything outside of these countries and that request will likely have come from outside of these regions.
For starters, you will want to use not in instead, plus you should use known bots, as you already mentioned. The following expression should work just fine
(not ip.geoip.country in {"CA" "MX" "PR" "US"} and not cf.client.bot)
This will block all requests which are not from these countries and which are not bots known to Cloudflare, so that particular request should not be blocked any more.