Does lead to this tool detecting GPTBot as blocked. It would be great if it could follow the standards and correctly work with multiple user-agent lines.
Now I use a blocking rule in WAF - (cf.verified_bot_category in {“AI Crawler” “AI Assistant” “AI Search”}), but even without this rule “AI Audit” was always empty.
The issue here I reported is that the AI audit wouldn’t detect Ivan had blocked the AI bots in robots.txt, because it doesn’t detect User-agent grouping, as Ivan, and I had grouped many bots, with one rule, namely disallow.
So, you’d have to list each and every bot separately, with it’s appropriate rule, for the AI audit interface to detect you’d blocked it. Which is against how RFC 9309 spec is defined.
I can kind of see the argument that perhaps these bots don’t interpret robots.txt correctly either. I have no idea if they all do / should respect User-agent: grouping, but the idea of the tool seems to be to offer oversight and control for Bots that don’t respect correctly formatted robots.txt files, so it would, in my opinion, be a valuable addition to the rather handy AI audit tool if it did respect the spec, and allow you to monitor and / or block crawlers that didn’t.
Thanks for clarifying. I was not aware of this limitation previously.
Found generally two trains of thought.
Best to cater to the lowest sophistication of crawler
Support what major crawlers do. (Which generally does include support for “groups” of UAs.)
If you want compatibility today you can use the more primitive version with a directive per each user-agent. Various software can help to generate robots.txt and/or keep it updated. Many are platform dependent or you could implement txt dynamic file using Workers.
I am not sure about our plans for AI Audit parsing implementation long term. I will followup internally and circle back when/if I get more clarity.