Public RTR Server Instability

Hi everybody!

I’ve been using the public RTR server, rtr.rpki.cloudflare.com, as an RPKI server for BGP route validation for the past few months. Starting this Monday, 4/4/2022, around 7AM EST, I starting seeing the TCP connection to this server flap regularly, every 4 or 5 minutes, rendering the service unusable.

Has anyone else been using the rtr.rpki.cloudflare.com service recently? If so, have you noticed this same instability?

Thanks,

Evan

What error are you getting?

I’m getting “ERR_CONNECTION_REFUSED”

I’m connecting from via BGP RPKI on a Cisco IOS-XR device, and the error I’m seeing looks like this:

RP/0/RP0/CPU0:Apr 8 15:06:06.531 EDT: bgp[1077]: %ROUTING-BGP-5-RPKI_ADJCHANGE : 172.65.0.2 UP
RP/0/RP0/CPU0:Apr 8 15:06:06.646 EDT: bgp[1077]: %ROUTING-BGP-5-RPKI_ADJCHANGE : 172.65.0.2 DOWN (read error)
RP/0/RP0/CPU0:Apr 8 15:06:11.532 EDT: bgp[1077]: %ROUTING-BGP-5-RPKI_ADJCHANGE : 172.65.0.2 UP
RP/0/RP0/CPU0:Apr 8 15:06:11.647 EDT: bgp[1077]: %ROUTING-BGP-5-RPKI_ADJCHANGE : 172.65.0.2 DOWN (read error)

Could you provide a screenshot of the error, if applicable?

Sorry I’m personally unfamiliar with this end point… what is it supposed to do? Was there a promise of panacea from Cloudflare in a blog post about what it does? If so, totally understandable it is now broken. Reference to a William Shakespeare play fist published in 1599…

To consume RPKI data you need to run a process to generate a list of all the signed prefixes. You need a mechanism to transfer that data to your routers so that they can compare the received routes and ideally drop any routes that are not signed but should be (Drop Invalid in the RPKI lingo). The protocol to transfer the data is RTR.

To save yourself having to run the validator (such as OctoRPKI) and RPKI-RTR servers (Such as GoRTR) you can just use the endpoint that the OP is referring to.

I know there used to be an issue with TCP connection issues on XR. Check the config snippets on GitHub - cloudflare/gortr: The RPKI-to-Router server used at Cloudflare

Your guess is correct cscharffl. This end point was announced in a blog post, and is also “documented” on the GitHub link that michael shared here: GitHub - cloudflare/gortr: The RPKI-to-Router server used at Cloudflare, in the following line: " GoRTR also powers the public RTR server available on rtr.rpki.cloudflare.com on port 8282 and 8283 for SSH (rpki/rpki)

I have attempted connecting from a Cisco IOS-XR machine to this endpoint via SSH, and via unencrypted TCP. Here is the config I’ve used for SSH - I’ve also followed the documentation to configure the username/password first before connecting to the server:

router bgp 27446
rpki server 172.65.0.2
username rpki
password clear rpki
transport ssh port 8283

ssh client tcp-window-scale 14
ssh timeout 120

With the above configuration, I see constant flapping to the endpoint with the following logs:

RP/0/RP0/CPU0:Apr 11 11:13:40.373 EDT: ssh_xr[68206]: %SECURITY-SSHD-3-ERR_DETAILS : (null) Connection reset by peer Client closes socket connection
RP/0/RP0/CPU0:Apr 11 11:13:40.373 EDT: ssh_xr[68206]: %SECURITY-SSHD-6-INFO_GENERAL : Error in receiving remote SSH version
RP/0/RP0/CPU0:Apr 11 11:13:45.414 EDT: ssh_xr[68216]: %SECURITY-SSHD-3-ERR_DETAILS : (null) Connection reset by peer Client closes socket connection
RP/0/RP0/CPU0:Apr 11 11:13:45.414 EDT: ssh_xr[68216]: %SECURITY-SSHD-6-INFO_GENERAL : Error in receiving remote SSH version

Here is the configuration I have tried with just plaintext TCP:

router bgp 27446
rpki server 172.65.0.2
transport tcp port 8282

With the above configuration, I also see the connection flap, this time with the following error logs:

RP/0/RP0/CPU0:Apr 11 12:59:56.576 EDT: bgp[1077]: %ROUTING-BGP-5-RPKI_ADJCHANGE : 172.65.0.2 UP
RP/0/RP0/CPU0:Apr 11 12:59:56.662 EDT: bgp[1077]: %ROUTING-BGP-5-RPKI_ADJCHANGE : 172.65.0.2 DOWN (read error)
RP/0/RP0/CPU0:Apr 11 13:00:01.576 EDT: bgp[1077]: %ROUTING-BGP-5-RPKI_ADJCHANGE : 172.65.0.2 UP
RP/0/RP0/CPU0:Apr 11 13:00:01.689 EDT: bgp[1077]: %ROUTING-BGP-5-RPKI_ADJCHANGE : 172.65.0.2 DOWN (read error)

I am wondering if anyone has successfully used this endpoint, or if anyone else is seeing this same flapping behavior.

Thanks,

Evan

This topic was automatically closed 15 days after the last reply. New replies are no longer allowed.