HTMLRewriter: extract and serve single dom node

Say I have an HTML document like so:

<!doctype html>
<html>
<head></head>
<body>
  <div class="my-body">
    <div id="my-content"><!-- return this --><p>some children</p></div>
    <div><!-- some other content --></div>
  </div>
</body>
</html>

I’d like to build a worker that can use the HTMLRewriter to extract out #my-content and only serve that to the client i.e.

<div id="my-content"><!-- return this --><p>some children</p></div>

I cannot think of possible way I can do this using HTMLRewriter. Seems Element Handlers only have the ability to inject/remove, but not extract. I tried an html rewriter like so:

const result = new HTMLRewriter().on('*:not(#my-content):not(#my-content *)', {
  element(e) {
    e.remove();
  }
}).transform(response);

But that gives me a “Unexpected token in selector” error. I don’t think I can use the multipart #my-content * selector within a :not()… regardless this seems like the wrong approach.

I’m not sure what I want to do is possible with the currently available methods. Any ideas?

Isnt your selector essentially removing the entire DOM (including the desired element)?

The selector I have doesn’t even work. As mentioned I get a “Unexpected token in selector”. I don’t think I’m using the :not() css selector correctly.

I would probably go for the other way round. For div#my-content, try to extract it, and then return that instead of trying to remove everything around it.

That’s what I’d like to do, but I don’t see how I can extract using the html rewriter. There’s no method for this type of thing on element handlers: https://developers.cloudflare.com/workers/reference/apis/html-rewriter/

I could probably using something like cheerio to extract the html, but that’s gonna be way less performant than HTMLRewriter since it needs to parse the whole document? If I understand correctly, HTMLRewriter is parsing a stream/chunks.

A related thread: Parsing HTML with Cheerio using too much CPU time?

Seems I could load in htmlparser2 and extract using TransformStream. That should suffice, though would be nice if possible via HTMLRewriter… working with TransformStream seems way less approachable.

I am not overly familiar with HTMLRewriter, however based on a few attempts it would seem as if this particular use case was not possible with HTMLRewriter.

remove() would not work as that would also remove the parent element and the target element with it. removeAndKeepContent() might be an option, but I am not sure the selector is powerful enough for that.

The overall issue seems to be that HTMLRewriter does not seem to offer the possibility to keep only certain elements, but only remove certain elements.

Even my idea to use HTMLRewriter only for the element selection (not the response generation) and store each selected element in an array would not work as a) the event listener runs in an asynchronous fashion and b) one doesnt seem to be able to access the HTML of an element.

In short, this task seems to be tricky to impossible at the moment. I’d probably skip workers altogether and generate the necessary response already on the origin.