Googlebot-Image/1.0 isn't treated as GoogleBot?

What is the name of the domain?

cloud.daporkchop.net

What is the issue you’re encountering

In analytics and firewall logs, requests with the User-Agent: “Googlebot-Image/1.0” aren’t considered to be bot requests, instead they’re categorized as non-bot requests under “Unknown/Other”. It would be nice if this user-agent were also added to the list of known bots, and merged with GoogleBot results if possible (as they’re the same bot).

What is the current SSL/TLS setting?

Full

I should note that I’ve received 1.2M requests from Googlebot-Image/1.0 in the past 24h, yet the Security->Analytics tab only shows the requests sent with the standard GoogleBot user-agent as being GoogleBot requests. The Googlebot-Image/1.0 don’t even show up on the crawler statistics on Analytics&Logs->Security.

It took me an unnecessary amount of work to figure out that yes, the huge spike in traffic to my site is actually coming from Google, but images are treated specially.

If we’re looking completely away from the User-Agent string, -

Can you clarify what other details (if any), that you may have used to identify this as being a legitimate Google bot, rather than one pretending to be a Google bot, but with a fake User-Agent?

1 Like

The requests are coming from a number of IPs registered to AS15169, mainly under the 66.249.79.0/27 prefix which is one of the verified GoogleBot IP ranges.

Selecting one of these at random, for instance 66.249.79.8, and attempting a reverse DNS lookup shows that it’s crawl-66-249-79-8.googlebot.com, as as documented by Google.

:~$ dig crawl-66-249-79-8.googlebot.com @1.1.1.1

; <<>> DiG 9.18.28-1~deb12u2-Debian <<>> crawl-66-249-79-8.googlebot.com @1.1.1.1
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 49250
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
;; QUESTION SECTION:
;crawl-66-249-79-8.googlebot.com. IN    A

;; ANSWER SECTION:
crawl-66-249-79-8.googlebot.com. 86400 IN A     66.249.79.8

;; Query time: 39 msec
;; SERVER: 1.1.1.1#53(1.1.1.1) (UDP)
;; WHEN: Sat Jan 04 21:34:32 CET 2025
;; MSG SIZE  rcvd: 76

Additionally, Google’s own documentation mentions this alternative User-Agent, and my Google Search Console page shows the same URLs being fetched by these Googlebot-Image/1.0 requests showing up in the sitemap with matching retrieval times.

1 Like

When I looked at the googlebot.json file, I initially saw that it was last updated on 2024-12-31, or, roughly 4 days before your thread.

Looking right now, they claim that they last updated it on 2025-01-07.

Unfortuantely, I do not have any verified bot my self, but … when I go through the Add a bot link on the Verified Bots page, I see multiple verification methods, such as e.g. Reverse DNS and IP List, but only one of them are selectable at a time.

I don’t see it documented more than, e.g. this, for “IP List”:

Provide extra information for your selected Verification Method (e.g. IP list URL(s))

Whether it will accept only a plain text file, e.g. with one IP address or IP subnet per line, or it will be able to parse the JSON that Google provides, remains unknown to me.

In addition, whether it will be possible for bot owners to add or adjust a specific frequency, where Cloudflare will automatically be updating a such IP list, is likewise unknown to me.

That said, -

To my knowledge, it is the bot vendor (e.g. Google, in this case) that needs to keep their bot registrations up-to-date, including submitting requests for updating IP ranges.

And again, to my knowledge, too, I also know very well that this does often not happen from many organisations, that are often forgetting it, apparently because they’re believing it will be some sort of “fire-and-forget” kind of registration.

Fortunately though, - Google is both well-documenting bot their bot crawlers, as well as the IP ranges that their bots are using, so I’m wondering if everything needed, is simply to make Cloudflare request (and update) the list more frequently than they eventually (already) are doing.

I do not see that Google is documenting their expected update frequency though.

And I didn’t keep a note of the timestamp for 2024-12-31, however, with 2025-01-07 15:46:02, that I assume is UTC, it looks a little bit like the frequency for updating googlebot.json might be weekly, at Tuesday’s, and I suspect that, if it isn’t already being done, then a weekly update, for example at Tuesday, 18:00:00 UTC, might eventually be able to limit the potential issues, at least a bit.

I’ll poke around, and try to see if I can get someone to look in to this.

2 Likes

Thanks for the response! I hadn’t considered that the IPs ranges could be new, but that makes a lot of sense. Glad to hear that this is being looked into!

1 Like

This topic was automatically closed 15 days after the last reply. New replies are no longer allowed.