Then adds them to a link HTTP header and removes them from the HTML body.
Therefore I need to wait for the HTMLRewriter to finish scanning between the <head></head> tag and append the link header before sending the response to the client.
Is there any way to make the HTMLRewriter non-streaming? Ideally, I’d like to be able to tell the HTMLRewriter to begin streaming after scanning <head></head> completely and appending the link header. If that’s not possible, I’d like to wait for the HTMLRewriter to finish scanning the entire HTML body before responding.
It wouldn’t be pretty, but you could manually read the transformed response body until the handler reported that it was done with the head element, then reconstruct the response from the transformed head and the rest of the unread, “transformed” (but really just passed through) body.
A wrinkle is that the HTMLRewriter doesn’t have a callback for closing tags, and doesn’t support the :last-child pseudo-selector (because it would require buffering), so the only way I can think of to determine when you’re done processing all the link elements is actually to match the body element. That’s weird, but in practice shouldn’t matter for this use case.
Here’s an example (look at the Testing tab to see the Link header it generates):
Unfortunately, there’s a major problem with that script: I wanted to use ReadableStream.pipeTo(), but can’t. Our implementation doesn’t allow pipeTo() to be used to pump bytes between two TransformStreams, but HTMLRewriter is implemented in terms of a TransformStream, and the final response body must by necessity be a TransformStream. So instead I used a manual read-write loop, which is much more CPU intensive, and will probably exceed the CPU limit after 5MB or so.
We’d like to relax that restriction on pipeTo(), but didn’t have a compelling reason to, at least until now.