Increase in random SSL Handshake failures

2023-06-15 we started seeing an increase in SSL handshake failures in our http client request logs
Both PHP Guzzle (cURL) and Nodejs Axios applications reports these logs.
We run almost all versions of PHP 8.0.x, 8.1.x and even some 8.2.x, they all report these logs at random, nodejs too.

I suspected dropped network packets, so I had a look at our network interface metrics, but
to my surprise I didn’t find any noticable dropped packets on any network interface on any production nodes, firewalls or switches. We have had dropped packet issues in the past so I knew to look :smile:

I then compiled a little application that calls openssl -connect <example.com> -tls1_2/-tls1_3 every 100ms and deployed it across a variety locations (our own servers, digital ocean droplets and google vm’s), some located in London, others in Frankfurt and some in Denmark

I directed these applications to several of our cloudflared proxied zones, both enterprise and free.
Some origin servers gets traffic from cloudflare the “old” way, but most gets traffic via cloudflared tunnel

About 0.1% of the times i get this from openssl, randomly spread across deploy location and origin targets.

CONNECTED(00000003)
48DBB9BB7A7F0000:error:0A000410:SSL routines:ssl3_read_bytes:sslv3 alert handshake failure:ssl/record/rec_layer_s3.c:1586:SSL alert number 40
---
no peer certificate available
---
No client certificate CA names sent
---
SSL handshake has read 7 bytes and written 212 bytes
Verification: OK
---
New, (NONE), Cipher is (NONE)
Secure Renegotiation IS NOT supported
No ALPN negotiated
SSL-Session:
    Protocol  : TLSv1.2
    Cipher    : 0000
    Session-ID: 
    Session-ID-ctx: 
    Master-Key: 
    PSK identity: None
    PSK identity hint: None
    SRP username: None
    Start Time: 1689080480
    Timeout   : 7200 (sec)
    Verify return code: 0 (ok)
    Extended master secret: no
---

or for TLS 1.3

	

CONNECTED(00000003)
480B4F5C237F0000:error:0A000410:SSL routines:ssl3_read_bytes:sslv3 alert handshake failure:ssl/record/rec_layer_s3.c:1586:SSL alert number 40
---
no peer certificate available
---
No client certificate CA names sent
---
SSL handshake has read 7 bytes and written 249 bytes
Verification: OK
---
New, (NONE), Cipher is (NONE)
This TLS version forbids renegotiation.
No ALPN negotiated
Early data was not sent
Verify return code: 0 (ok)
---

I’ve opened a cloudflare support ticket, and is in the process of punching through the initial first-level support questions :smile:

In the meantime I could use some help figuring out what is going on here.

I’m able to setup just about any test, if someone has any good idea how to get more debug information, and we have just about all the metrics one could wish for.

Thanks in advance
Mads

1 Like

Hello @ mjn

Based on the debug information you posted, it seems like the SSL handshake is failing intermittently. This issue may arise from a problem with the TLS settings on your server. Here are some steps to get more debug information:

  1. Try running a test using Qualys SSL Labs: SSL Server Test (Powered by Qualys SSL Labs). Input your site’s URL to analyze your server.
  2. Check if your server’s cipher suites are correctly configured and up-to-date.
  3. Test your site with different TLS versions (1.1, 1.2, 1.3) to see if a specific version is causing the issue.
  4. Use a network monitoring tool to trace what is happening during the handshake process. Wireshark can be a good option for this.
  5. Maintain close communication with Cloudflare support through your existing ticket. They can help you analyze your Ray ID information and server logs to pinpoint the problem.

Additionally, refer to this Cloudflare Community post for potential solutions: Cloudflare Post

1 Like
  1. I cannot run Qualys SSL Labs on most of my servers, because they are behind cloudflared tunnel

  2. My servers are mostly pure http getting traffic from the cloudflared binary, so I cannot configure any cipher suites for them

  3. I attached two openssl logs, one for TLS 1.2 and one for TLS 1.3, they both show the problem.

  4. We just managed to get a TCP dump of the TLS handshake, and have attached it to my open ticket.

  5. Will do :smile:

Help! Error 525 SSL handshake failed - #3 by erictung Cannot possibly be related to this problem.

Hello @mjn,

We face the same issue on our Apps in Ruby 3.2.2 and the latest NodeJS 16. We have estimated that 0.0015% of the requests fail with this SSL error.

We also opened a support ticket, but we are still in the process and don’t have a solution yet.

If you resolve your case, could you share information about the resolution?

Regards,

Pierre-Barthelemy

Hey @pierre-bart

I have a feeling that this will end up with a cloudflare post mortem article.

I’ve spent the last four days collecting tcpdumps from our servers around the world, and I’ve found that our http clients sends a perfectly valid TLSv1 “Client Hello” packet, and a server owned by cloudflare, properly an edge node, immediately responds with Handshake Failure (Alert 40).

I’ve sent my tcpdumps (pcap files) to my ticket, and hopefully cloudflare developers will find the bug any day now :smile:

I will not disclose the full pcap files, but here is a screenshot of one of them.

  • Our application initiates a TCP connection
  • Our application sends a “Client Hello” packet
  • Cloudflare IP (104.18.24.4) responds with a “Handshake Failure” packet

The TCP connection succeeds, and the request/respond packets succeeds, so it cannot be the wonky internet (dropped packets).

Only been able to reproduce on our cloudflare accounts utilizing Certificate Management

Are you guys using Certificate Management to customize the cipher suites or alike?

It smells like a cloudflare bug to me, but I’ve been wrong before :smile:

1 Like

I can confirm we have found the same kind of pattern in our attempts to capture network packets.

Yes, we use this service. We have enabled the modern ciphers listed here: https://developers.cloudflare.com/ssl/reference/cipher-suites/recommendations/, and we enforce TLS version >= 1.2.

It must be a bug specific to their Certificate Management system then.

I’ve tried zones in two other accounts not utilizing Certificate Management, and cannot reproduce on any of those zones.

Wish I had a direct number to a Cloudflare Engineer, been spending endless hours explaining how basic TLS and TCP works in my current ticket :smile:

Just commenting we also seem to be affected by this and have created a support ticket.

@mjn @pierre-bart Did you notice any location based or time of day pattern with the errors?

While we can replicate the errors, on our tests the rate seems much lower than what is reported by our client, but also varying greatly (4 in a one hour period, then 24 hours without errors).

Is there also any metric that can track this error in our dashboards?

It’s been fixed for us, after cloudflare developers had a look at it.

1 Like

Indeed, last we observed the issue was on July 25.