Cloudflare blocks Googlebot and Ahrefsbot

curl -v -L -A AhrefsBot https://doxzoo.com/photoofthemonth

  • Trying 104.26.15.236:443…
  • TCP_NODELAY set
  • Connected to doxzoo.com (104.26.15.236) port 443 (#0)
  • ALPN, offering h2
  • ALPN, offering http/1.1
  • successfully set certificate verify locations:
  • CAfile: /etc/ssl/certs/ca-certificates.crt
    CApath: /etc/ssl/certs
  • TLSv1.3 (OUT), TLS handshake, Client hello (1):
  • TLSv1.3 (IN), TLS handshake, Server hello (2):
  • TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):
  • TLSv1.3 (IN), TLS handshake, Certificate (11):
  • TLSv1.3 (IN), TLS handshake, CERT verify (15):
  • TLSv1.3 (IN), TLS handshake, Finished (20):
  • TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1):
  • TLSv1.3 (OUT), TLS handshake, Finished (20):
  • SSL connection using TLSv1.3 / TLS_AES_256_GCM_SHA384
  • ALPN, server accepted to use h2
  • Server certificate:
  • subject: C=US; ST=California; L=San Francisco; O=Cloudflare, Inc.; CN=doxzoo.com
  • start date: May 26 00:00:00 2021 GMT
  • expire date: May 25 23:59:59 2022 GMT
  • subjectAltName: host “doxzoo.com” matched cert’s “doxzoo.com
  • issuer: C=US; O=Cloudflare, Inc.; CN=Cloudflare Inc ECC CA-3
  • SSL certificate verify ok.
  • Using HTTP2, server supports multi-use
  • Connection state changed (HTTP/2 confirmed)
  • Copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0
  • Using Stream ID: 1 (easy handle 0x5584cccdde30)

GET /photoofthemonth HTTP/2
Host: doxzoo.com
user-agent: AhrefsBot
accept: /

  • TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
  • TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
  • old SSL session ID is stale, removing
  • Connection state changed (MAX_CONCURRENT_STREAMS == 256)!
    < HTTP/2 404
    < date: Wed, 20 Apr 2022 15:20:57 GMT
    < content-type: text/html
    < vary: Accept-Encoding
    < vary: Origin
    < x-frame-options: SAMEORIGIN
    < x-xss-protection: 1; mode=block
    < x-content-type-options: nosniff
    < x-download-options: noopen
    < x-permitted-cross-domain-policies: none
    < referrer-policy: strict-origin-when-cross-origin
    < cache-control: no-cache
    < set-cookie: _doxzoo_session=VEp5L2NQZEtLdDEzMU5NdUMydWdrZ243bEt0VkRDMWoyZlFWNWQwaXY2T3JhcmxSdVAyVEQzOThOR1VlZHE1MHBra0xGQVhaSVdUY0ozbHBoelpud2tkTTRBKzZYa1hRMGNNVGk2VVh2eVp1NHBIc2ZndHkyR3k2VnBnSnN1c095Sjl0ckdJUUJsblg5L1ZmdThhaW1FSkhVNTl6R29MTzgwamxnWTNFeDZBPS0tazFRY3pjU1J2TGY0YlJNUVN2Ky9sZz09–fd9b54ce624d20982e3d5fd8c57cf85b405c4bd9; path=/; HttpOnly
    < x-request-id: ca72a06d-52eb-4ce4-8173-e789a8

I allowed Knownbots and whitelisted Ahrefs IPs. Issue is not with Ahrefs but with Googlebots, Moz and Semrush bots too.

If you are running those commands from your own computer Cloudflare will recognise that you are trying to masquerade as Googlebot etc. It does not mean that Googlebot. etc are normally being blocked.

On the Firewall tab of your dashboard you will see the blocked traffic. Can you share the detail of one of the blocked requests (one that is not your testing using cURL).


Yes here

That’s not a good bot being blocked though.

1 Like

Yes, but it does not explain the issue. Ahrefs, Moz and Semrush get 404 errors from Cloudflare. This is very strange situation.

You are doing something funny with Content Negotiation on your Origin server:

% curl 'https://doxzoo.com/photoofthemonth' -H 'Accept: text/plain' -o /dev/null --dump-header - --silent  | grep -i 'HTTP/'
HTTP/2 404
% curl 'https://doxzoo.com/photoofthemonth' -H 'Accept: text/html' -o /dev/null --dump-header - --silent  | grep -i 'HTTP/'
HTTP/2 200

Can you run the following command, putting your Origin servers IP address in the indicated place:
curl 'https://doxzoo.com/photoofthemonth' -H 'Accept: text/plain' -o /dev/null --dump-header - --silent --connect-to :: <ORIGIN IP HERE> | grep -i 'HTTP/'

Thank you indeed.

This topic was automatically closed 15 days after the last reply. New replies are no longer allowed.