HTMRewriter never decodes HTML entities

An interesting observation, given this HTML document<div%20myatr="%26quot%3B"><%2Fdiv>


document.body.firstChild.getAttribute(‘myatr’) returns ‘"’ and .length == 1

using HTMLRewriter

.on("div", { element: function(element) { var s = element.getAttribute('myatr'); })

s is “”" and .length == 6


your regexps will never match your test case from Chrome. I have an app that sends JSON in HTML attributes properly escapes with &quot;. A browser automatically decodes the entities. HTMLRewriter doesnt. Not sure what the behavior should be but CF will probably never change the behavior. A big gotcha that Worker getAttribute doesn’t match W3C getAttribute.

Probably bc doesn’t know what HTML entities are.