I think what we are seeing is the result of several things happening, some of which should not be happening:
Issue 1 - .well-known & capta
- Cloudflare IndexNow is sending https://www..co.uk/.well-known/captcha/ to BingBot
- BingBot ignores robots.txt and sitemap.xml
- BingBot then tries to crawl .well-known & .well-known/capta and gets blocked by rules in Cloudflare and/or rules in .htaccess
Issue 2 - non-existent files & folders
- Where is BingBot getting these paths from? Not from the site, not from sitemap.xml, not from robots.txt - so why is it trying to crawl them? They are paths that have never existed
Microsoft support either dont know or refuse to say what the bot is up to or why it is trying to crawl these paths - their response is we need to fix the site so that the paths exist - but cant tell us what paths
Side issue of this - other bots, such as perm blocked Yandex, are trying to crawl the same non-existent paths - so it appears that Microsoft are maybe passing the non-existent paths on to them as well?
We just blocked bing altogether - the number of hits we get where bing is the referrer is minute