Intermittent 525 SSL

I’ve managed to find some Windows Event Viewer logs that represent 525 errors. From the event log below:

An TLS 1.2 connection request was received from a remote client application, but none of the cipher suites supported by the client application are supported by the server. The TLS connection request has failed

The SCHANNEL event log entries come in pairs: a 36874 one is at SCHANNEL logging level 2 (error) and the second is 36888 at SCHANNEL logging level 4 (info):

Event ID 36874 (docs)

<Event xmlns='http://schemas.microsoft.com/win/2004/08/events/event'>
    <System>
        <Provider Name='Schannel' Guid='{1F678132-5938-4686-9FDC-C8FF68F15C85}'/>
        <EventID>36874</EventID>
        <Version>0</Version>
        <Level>2</Level>
        <Task>0</Task>
        <Opcode>0</Opcode>
        <Keywords>0x8000000000000000</Keywords>
        <TimeCreated SystemTime='2021-06-09T07:37:20.851447000Z'/>
        <EventRecordID>375477</EventRecordID>
        <Correlation ActivityID='{4D165D95-533C-0001-B05D-164D3C53D701}'/>
        <Execution ProcessID='840' ThreadID='5588'/>
        <Channel>System</Channel>
        <Computer>COMPUTERNAME</Computer>
        <Security UserID='S-1-5-18'/>
    </System>
    <EventData>
        <Data Name='Protocol'>TLS 1.2</Data>
    </EventData>
    <RenderingInfo Culture='en-US'>
        <Message>An TLS 1.2 connection request was received from a remote client application, but none of the cipher suites supported by the client application are supported by the server. The TLS connection request has failed.</Message>
        <Level>Error</Level>
        <Task></Task>
        <Opcode>Info</Opcode>
        <Channel>System</Channel>
        <Provider></Provider>
        <Keywords></Keywords>
    </RenderingInfo>
</Event>

The second Event Viewer log (level 4) is the informational that explains what SCHANNEL then did, which was to return code 40 (Handshake Failed). Event ID 36888 (docs)

<Event xmlns='http://schemas.microsoft.com/win/2004/08/events/event'>
    <System>
        <Provider Name='Schannel' Guid='{1F678132-5938-4686-9FDC-C8FF68F15C85}'/>
        <EventID>36888</EventID>
        <Version>0</Version>
        <Level>4</Level>
        <Task>0</Task>
        <Opcode>0</Opcode>
        <Keywords>0x8000000000000000</Keywords>
        <TimeCreated SystemTime='2021-06-09T07:37:20.851448600Z'/>
        <EventRecordID>375478</EventRecordID>
        <Correlation ActivityID='{4D165D95-533C-0001-B05D-164D3C53D701}'/>
        <Execution ProcessID='840' ThreadID='5588'/>
        <Channel>System</Channel>
        <Computer>COMPUTERNAME</Computer>
        <Security UserID='S-1-5-18'/>
    </System>
    <UserData>
        <EventXML xmlns='LSA_NS'>
            <AlertDesc>40</AlertDesc>
            <ErrorState>1205</ErrorState>
            <TargetName></TargetName>
        </EventXML>
    </UserData>
    <RenderingInfo Culture='en-US'>
        <Message>A fatal alert was generated and sent to the remote endpoint. This may result in termination of the connection. The TLS protocol defined fatal alert code is 40.

Target name:

The TLS alert registry can be found at http://www.iana.org/assignments/tls-parameters/tls-parameters.xhtml#tls-parameters-6</Message>
        <Level>Information</Level>
        <Task></Task>
        <Opcode>Info</Opcode>
        <Channel>System</Channel>
        <Provider></Provider>
        <Keywords></Keywords>
    </RenderingInfo>
</Event>

I have not changed any crypto settings on the server (except turning off SSL 3.0, TLS 1.0 and TLS 1.1), so I am somewhat sceptical. I first compared the Cloudflare List of cyphers against the Windows Server 2016 list and there is plenty of crossover (green on the right means there’s a match).

I also tried SSLlabs excellent tool against my test-tenant-but-live-server unproxied endpoint so that I can see what cyphers it is offering. They all look good. The test handshakes also looked fine.

The test tenant hasn’t reported any errors running every 5 minutes for the last ~48 hours.

We’ve also improved our logging from the React front end to Airbrake, our error provider - so I am seeing a lot more failures now: nearer 0.5%. We’ve also added retry middleware to the React front end to smooth over the experience for the users.

From the linked article I was wondering if you needed to enable CAPI logging?

Not yet, that’s the next thing to read through. Many thanks. I think SCHANNEL in the Event Viewer is telling me something but perhaps not enough.

Thanks all for your feedback, with this new information where do I look next?

1 Like

Did you get this message on your proxied or on the unproxied setup?

1 Like

Proxied. I’ve not received any errors on the unproxied setup.

In that case I’d post that information in the ticket you already opened and ask if support could check out if the proxies could potentially omit certain ciphers in some cases which are required by your setup.

It seems the issue would be that your server and the proxies can occasionally not agree on a cipher and that’s why the entire handshake fails, but I am afraid that’s something only support can look into and check what the difference could be between these two types of requests (when it works and when it does not).

2 Likes

Thank you Sandro, I’ll reopen the ticket. Many thanks.

Considering that you restricted your setup to 1.2 I’d assume you also tweaked the ciphers, maybe you limited it a bit too much. However, also considering that the connection generally works, I’d say Cloudflare should be still happy with it. So either Cloudflare occasionally does not accept your ciphers or your server occasionally changes its cipher list.

But yeah, support should hopefully be able to provide more insight here.

1 Like

Thank you, I completely agree with you. I’m hoping that the solution is anything but “upgrade to Windows 2019”, which has a better suite but is a fair amount of work to cross my desk! :smiley:

1 Like

I don’t think it should be version related. Either your server occasionally drops some of the ciphers it usually accepts or Cloudflare does that on its side. You’d really need to compare two handshakes in order to say that for sure. For the time being, could you try to make your SSL setup a bit more lenient?

2 Likes

I see a handful of back 'n forth emails with support on 2175902 a few hours before your post. If you have a ticket open without a reply, can you share the ticket number? I will check it out.

Hi Cloonan,

The support team contacted me finally and issue is resolved by now.

Thank you,

Maxim

2 Likes

Would you be able to share what solved the problem for you? I have a similar thing happening - with two different servers, a very low percentage of 525 errors and for the one server I have full control (the other is a white label service), I am sure the cypher list matches (and yes, Full Strict setup).

Solution tldr;
We use ESET File Security 7 on our servers and it was creating a temporary blacklist of IPs, which sometimes included Cloudflare ones. I added Cloudflare IPs to the IDS exception.

Detail
The support engineer (Andronicus - thank you!) found the problem to be intermittent from within Cloudflare and suggested looking for anything that would do a dynamic blocking of IPs - especially Cloudflares. As all our traffic comes through Cloudflare, it might look like an attack and attackers hitting the domain endpoint would also have a Cloudflare IP.

I took that and did a lot more reading. We’re on AWS EC2, so the technology they use is AWS Shield, which is on by default and deals with flood rather than our low levels.

I then did a full inventory of the server. Went through every single app and Windows Firewall in detail. I have plenty of experience with Windows Firewall, so I could see that there was nothing misconfigured.

ESET File Security for Windows Server is a pretty good anti-virus - particularly for botnet intrusion detection. It also has a feature called Network Attack Protection (IDS) for scanning for suspicious network traffic. Part of that is a “temporary IP address blacklist”. From ESET:

View a list of IP addresses that have been detected as the source of attacks and added to the blacklist to block connections for a certain period of time. Shows IP address that have been locked.

IPs get added to the blacklist for a short period of time. By watching the blacklist (it doesn’t keep logs) I spotted some familiar IPs popping up.

I’ve added the Cloudflare IPs to the IDS exception list (it allows address ranges) and raised a ticket with ESET to ask them how I keep that list up to date as Cloudflare can’t be expected to keep the full list of IPs static.

I’m pretty sure that’s the problem but I won’t know for sure until I have a few days of data (a few days of no 525 errors).

Many thanks again for all your help - I hope this writeup goes some way to help others in the future.

4 Likes

Glad you eventually found what was blocking it. Something blocking and rejecting the connection on the server is typically the reason for a 525. Did you find out why it did not get blocked in an unproxied context?

1 Like

That list is quite static. There’s only one time in my memory where the list changed, and that was quite recent. A few small ranges were reallocated and removed from the list.

UPDATE: p.s. We all got notification of that change…at least I think everybody did. It was a hot topic here for a week or two.

2 Likes

Did you find out why it did not get blocked in an unproxied context?

I think it’s because when you’re unproxied, the IP the server sees (that ESET sees) is the IP of the user. Each user in that case would have their own IP from the point of view as the server. As soon as you are proxied, our server only sees Cloudflare’s IP ranges.

If a hacker is trying to gain access, they will have the same Cloudflare IP as a user who is just using the system normally.

That list is quite static. There’s only one time in my memory where the list changed, and that was quite recent.

That’s good to know, thank you.

So it is a sort of rate limiting then? Yeah, sure, that will explain why it behaved differently. That was actually the reason why I asked :slight_smile:

The other thing is, I am surprised by the cipher error, as that would suggest the connection was dropped for another reason.

1 Like

I agree - that is odd. What might be happening is that ESET is refusing during the handshake after “Hello” and that gets interpreted as a cypher mismatch. Andronicus (Cloudflare tech) said that if it was a cypher mismatch then connection would be impossible rather than intermittent - which makes sense.

I would be delighted to get into the details about why it was at that specific point but I doubt ESET are going to divulge much.

Yeah, that could be, though then it would still not be a cipher mismatch and the error message rather misleading.

Not if the mismatch only occurs occasionally for whatever reason. I addressed that at Intermittent 525 SSL - #29 by sandro

I guess the most likely explanation will be the misinterpreted connection drop.

1 Like

Thanks for the explanation. And good thing I asked because we also use ESET File Security 7 in all our servers. The 525 error rate in the last 24 hours was 0.02% but just in case I have added the Cloudflare IPs to the IDS list as well.

2 Likes

Also, just looked at the Web Analytics tab and filtered by 525 codes… This looks very strange - I wonder if Cloudflare changed anything over the last week or so.

[EDIT] I have previously raised the possibility of Windows Update or memory upgrade influencing this but this is not possible as the same drop is reflected on a different subdomain that’s not hosted on my own server, therefore not impacted by either Windows Update or memory upgrade. It points out to some change on Cloudflare’s side.