CF Trace + Aioquic succeeds but Workers + Aioquic fails after GREASE

Hello!

I’m very interested in utilizing workers to replace my backend. I need the speed of QUIC to power my software, so since Workers supports HTTP3 it seems like an excellent pairing. However despite being able to reach my worker over H3 in Chrome, I’m having problems reaching it in Aioquic (Python’s only async QUIC+H3 library). This is puzzling because I’m able to reach Cloudflare trace with aioquic with no issues, but trying to reach my worker will hang the program. I don’t have a QLog, but I have output from Aioquic.

Below is from my application requesting /cdn-cgi/trace.

ProtocolNegotiated(alpn_protocol=‘h3’)
HandshakeCompleted(alpn_protocol=‘h3’, early_data_accepted=False, session_resumed=False)
ConnectionIdIssued(connection_id=b’gA\x1b\xc9\x83\xfbA\x02’)
StreamDataReceived(data=b’\x00\x04\x15\x06\x80\x01\x00\x00\xc2\xe0\xf25\xb8\xee y\xef)\xc7\x89\x08P\x8a\x8a’, end_stream=False, stream_id=3)
StreamDataReceived(data=b’\x02’, end_stream=False, stream_id=7)
StreamDataReceived(data=b’\x03’, end_stream=False, stream_id=11)
StreamDataReceived(data=b’\xea=\x0bvMW\xa5AGREASE is the word’, end_stream=True, stream_id=15)
StreamDataReceived(data=b’\xe4:\xf41\xc7\x8fb\x0b\x00\xc8Z\xf1\x1a\xea_\x0f\x89\x12GREASE is the word\[email protected]\x00\x00\xd9V\x96\xdf=\xbfJ\x05\xa55\x11*\x08\x02\n\x81z\xe3O\xdc\x10\x14\xc5\xa3\x7f\xf5\xe3_M\x87%\x07\xb6Ih\x1d\x85-$\xabX?_\x8fq\xf2\x8aRF\x02>\xf0\x80A:\xb5\xfc<\xbf_S\x84\xbf\x83O?\xfd-/\x9a\xcdaQ\x96\xdf=\xbfJ\x00e\x1dJ\x05\[email protected]\xa0\x01p\x00\xb8\x00\xa9\x8bF\xff\xe7\[email protected]\xb4fl=\nh=www.cloudflare.com\nip=...\nts=1634237360.769\nvisit_scheme=https\nuag=aioquic/0.9.15\ncolo=DFW\nhttp=http/3\nloc=US\ntls=TLSv1.3\nsni=plaintext\nwarp=off\ngateway=off\n\x00\x00’, end_stream=True, stream_id=0)
Response received for GET /cdn-cgi/trace : 0 bytes in 0.0 s (0.000 Mbps)
ConnectionTerminated(error_code=<QuicErrorCode.NO_ERROR: 0>, frame_type=None, reason_phrase=’’)

Below is from my application trying to reach my worker.

ProtocolNegotiated(alpn_protocol=‘h3’)
HandshakeCompleted(alpn_protocol=‘h3’, early_data_accepted=False, session_resumed=False)
ConnectionIdIssued(connection_id=b’;\xc9\xdb\xb4%\x1ee\xd2’)
StreamDataReceived(data=b’\x00\x04\x15\x06\x80\x01\x00\x00\xdf\x89\x1e\x11\x06|\xea!\xcd\x05\t\x8f$\x17\x1f\xbe’, end_stream=False, stream_id=3)
StreamDataReceived(data=b’\x02’, end_stream=False, stream_id=7)
StreamDataReceived(data=b’\x03’, end_stream=False, stream_id=11)
StreamDataReceived(data=b’\xc3\x91\x06|k\x9f\xe6KGREASE is the word’, end_stream=True, stream_id=15)

I am using aioquic 0.9.15 and this is a test using the standard example http3 client.
Thank you for the read!

I’m sitting here with Wireshark and I think that the version of Quiche that Workers uses may differ from the rest of the Cloudflare stack. I’ve tried a few things to try and narrow it down. I copied 1 for 1 all of Chrome’s headers that were sent to my worker in it’s successful request into my aioquic request, to no avail. Comparing packets from Chrome and aioquic, the only difference I’m noticing is that Chrome never sends a SCID and the PKN in the Initial Packet is 1 versus aioquic’s 0. Since Cloudflare gets as far as responding to everything except for what the worker is responsible for, I’d like to think aioquic has a slightly different behavior from say Chrome or Firefox and it’s behavior that whatever build of Quiche that Workers is running on is expecting. Does anyone have an input or questions?

To check if this was an issue with workers or aioquic’s implementation, I requested a Cloudflare owned workers site (https://silentspacemarine.com/’) with both Firefox (so that I had a truly first time negotiation of it connecting) and aioquic. They look identical, the only difference is that Workers never responds to the first stream, stream 0. Seeing as how /cgi-bin/trace has no cryptographic or protocol issues with responding to aioquic, I’m assuming this is a regression in Quiche or some other in-house software. Could anyone else try communicating with workers with aioquic and see if they get the same issue? Maybe some Cloudflare employees, as I think this is an internal issue?

Hi @Ancillary,

Thanks for the report. There seems to be something up here that needs some further investigation.

Much appreciated! Let me know if there’s any more info I can provide.

An update on this:

We reached out to the author of aioquic. Thanks to the detailed information in the report you provided we were able to reproduce the issue and gain a deeper understanding of the problem. This is caused by a specific interaction between how the aioquic demonstration http3_client works, and how we treat it.

The outcome on the aioquic side is that the http3_client example has been updated to make it more aligned with the expectations of HTTP semantics and content (nee payload) handling. See [examples] don't sent HTTP/3 DATA if there is no request body · aiortc/[email protected] · GitHub

If you’re running your own client application, the fix should similarly apply.

2 Likes

Awesome! Thank you so much. I will update my code accordingly.

HTTP2 is still a bit faster than HTTP3 in real-world tests sadly, even for the long tail (cases with high latency, dropped packets, or sub-1Mbps speeds). HTTP2 will serve you just fine for the next 2-4 years until HTTP3 optimizes further and actually catches up… :slight_smile:

I’ve read a few studies on the subject and this is probably one of the better ones: https://arxiv.org/pdf/2102.12358.pdf. I like this study because you can use it as a guide on when HTTP3 may actually be a better choice than HTTP2.

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.