HTMRewriter never decodes HTML entities

An interesting observation, given this HTML document

http://urlecho.appspot.com/echo?body=<div%20myatr="%26quot%3B"><%2Fdiv>

Chrome/FF

document.body.firstChild.getAttribute(‘myatr’) returns ‘"’ and .length == 1

using HTMLRewriter

.on("div", { element: function(element) { var s = element.getAttribute('myatr'); })

s is “”" and .length == 6

:frowning_face:

your regexps will never match your test case from Chrome. I have an app that sends JSON in HTML attributes properly escapes with &quot;. A browser automatically decodes the entities. HTMLRewriter doesnt. Not sure what the behavior should be but CF will probably never change the behavior. A big gotcha that Worker getAttribute doesn’t match W3C getAttribute.

Probably bc
https://github.com/cloudflare/lol-html/blob/master/src/parser/state_machine/syntax/tag/attributes.rs doesn’t know what HTML entities are.