HTMLRewriter - element end tag

Is there any reason why HTMLRewriter doesn’t have a callback for an element end tag? I don’t think it’s a limitation of the streaming parser since it’s capable of replacing a tag and the corresponding end tag.

Having an end tag would allow context, specifically, you could know the parent of a given tag. Right now when the rewrite sees a ‘b’ tag, it can’t differentiate between:


and

It would be extremely useful to be able to transform ‘b’ based on its DOM context.

Because the callback implicitly covers the closing tag as well. The callback does not get called for individual tags but for whole blocks, which are either normal elements (with a closing tag) or void elements (self-closing). Otherwise Element.replace() wouldn’t work for example.

You can observe this at https://cloudflareworkers.com/#fe460b286042e78dc619872f00f3705e:https://tutorial.cloudflareworkers.com

</b> won’t fire anything in the first place. With your example you will get one call for the whole <b> block.

This makes sense, but my real question is if I’m processing a particular tag, say ‘b’ how can I conditionally do something different if I’m inside of an ‘a’ tag vs not inside that tag.

Here’s a contrived example:
I want to replace all ‘b’ tags with ‘c’ if ‘b’ occurs inside of ‘a’ otherwise replace with say ‘d’.

I can’t reliably do that above now, because I can’t differentiate between:


and

There are two possible solutions (at least) - a callback for the end tag or get passed the ‘stack’ (i.e. some structure that tells the user where they are in the DOM tree).

HTMLRewriter would not support that at this point. Assuming the callback gets called in a serial fashion the best you could achieve right now is to apply some sort of state machine logic where you set flags depending on your position and which tags you have encountered and then act based on their state whenever you get a call for your desired tag. Though, I have not implemented anything remotely like that yet, so that’s rather hypothetical.

Though, because there is no clear indicator when a tag closes it might be difficult to establish if a subsequent tag is part of it or not. At the moment you most likely won’t be able to use HTMLRewriter for your use-case.

This is the conclusion I’d come to as well. Is adding this capability on the product road map?

There’s no public roadmap and there wasn’t exactly much work done on HTMLRewriter recently either, so I wouldn’t expect that feature any time soon. That being said, you can certainly open a thread at #feedback:prodreq and suggest such an enhancement to the library, but I’d still keep lowered expectations :slight_smile:

I can’t see the example you gave due to formatting issues, but would something like this suffice?

  return new HTMLRewriter()
      .on("a > b", new ReplaceWithC())
      .on(":not(a) > b", new ReplaceWithD())
      .transform(response)

It wouldn’t be able to match root-level b tags, of course.

2 Likes

Good point, with a proper CSS selector that should actually work (forests and trees).

1 Like

That’s great and definitely helps but it basically shows that my example was over-simplified.

The real use case is more like a templating engine, using HTML tags, where the attributes of ‘a’ affect ‘b’. Here is a better example:

When the parser sees the person tag, that would trigger a lookup (cache, API, etc) by the SSN, then ‘name’ would be replaced with:

Joe

Argh - markdown replaced the HTML

<person ssn="xxx">
<name/>
</person>

Would change ‘name’ to:

<div>Joe</div>

I don’t think that’s possible, because things like person and name aren’t defined HTML tags. Put another way, the HTMLRewriter isn’t intended to parse XML. If you could replace them with HTML tags, e.g. <div class="person" ssn="xxx"> and <div class="name" />, then you could potentially accomplish what you need to do by storing state in the ElementHandler instance.

1 Like

This topic was automatically closed 24 hours after the last reply. New replies are no longer allowed.