Bing IndexNow is submitting URLs that should not be submitted that came from Cloudflare

We recently have issues with Bing indexing due to automated pages that Cloudflare submitted to search engine using the Bing IndexNow.

Is there anyone here experiencing the same issue? How did you solved it? Are there any ways we can stop Cloudflare from submitting page urls daily through Bing indexNow?

We had the same issue - bing was trying to crawl the contents of .well-known & looking for lots of non-existent folders and files
Had many email exchanges with bingbot support, they were no help at all and just kept repeating the silly “check your site and ensure that files exist, fix the issue at your end it not us”
Ended up blocking bingbot totally, all the rest of the microsoft ASN’s and IP were already perm blocked so no loss

Thanks for the reply @paul32

I see, we sent few support tickets but none of them getting any responds. If they replied, its possible to get the same msg that you have received from them.

Well, if we blocked bingbot totally, we’ll lost our site visibility on bing on some users that are still using bing search engines but if bing causes any issues on the urls that should not be indexed or crawled then might as well do that in the future if it gets worst…

Last option to see before doing that is for Cloudflare to check our site on their end, it stated that Cloudflare is the source of the urls being submitted.

I think what we are seeing is the result of several things happening, some of which should not be happening:

Issue 1 - .well-known & capta

  1. Cloudflare IndexNow is sending to BingBot
  2. BingBot ignores robots.txt and sitemap.xml
  3. BingBot then tries to crawl .well-known & .well-known/capta and gets blocked by rules in Cloudflare and/or rules in .htaccess

Issue 2 - non-existent files & folders

  1. Where is BingBot getting these paths from? Not from the site, not from sitemap.xml, not from robots.txt - so why is it trying to crawl them? They are paths that have never existed
    Microsoft support either dont know or refuse to say what the bot is up to or why it is trying to crawl these paths - their response is we need to fix the site so that the paths exist - but cant tell us what paths
    Side issue of this - other bots, such as perm blocked Yandex, are trying to crawl the same non-existent paths - so it appears that Microsoft are maybe passing the non-existent paths on to them as well?

We just blocked bing altogether - the number of hits we get where bing is the referrer is minute

The index now from cf is pretty bad. It submits not only every entered URL to bing but to Yandex as well. even if you mistype an address, they are on it in seconds and will check it forever, given cf just-auto submits all URLs without even checking if it’s locked or behind firewall rules too.

The best bet is to use the index now from your side as it’s pretty good in the principal index now. Even WordPress has plugins that support this. I did this and helps a lot for new URL submissions, but the old ones Cloudflare submitted will always crawl unless you block it via robots.txt.

I’d do this instead of blocking bing, Seznam, and Yandex, as you are relying on Google only. Google in the three pass months, has really turned the SERPs upside down. If you get hit, you will need as many alternatives as you can.

1 Like

Thank you for the reply @JoshJ

Agreed with you, Indexnow from cf is submitting random urls even those paths for backups, 404 pages that were already 301 to another page, domain urls with the http even though it was updated to redirect to https.

We’ll definitely use indexnow on our side for now instead of using cf. We don’t want to block bingbot from crawling our site either.

Hey, I have the same issue which led to my well-performing site being delisted. Crawler Hints at Cloudflare was sending exploit attempts (looking for compromised files and directories that do not exist) to Bing via IndexNow. Bing included them in its index of my site and then delisted it.

Does anyone know at what point Cloudflare sends requests using Crawler Hints? Is it before or after bot management/WAF? Because if it’s after WAF it shouldn’t be sending so much junk.

This topic was automatically closed 15 days after the last reply. New replies are no longer allowed.