Characters encoded in Windows-1252 (ISO-8859-1) do not display

I verified that when the page is requested normally through Cloudflare that what looks like a UTF-8 byte order marker (or whatever this is: �) is being inserted in place of ANSI characters. I have correctly configured the header on the origin server to Content-Type: text/html; charset=Windows-1252 and have tried purging the cache, but that makes no difference to Cloudflare. It works just fine when I request the page directly from the origin server though (not through Cloudflare), so it must be a Cloudflare issue.

How can I fix this?

Are you sure this issue is related to Cloudflare?

Seems to me more like the application, database charset or encoding of the Website.

Are you using some kind of a page cache?

There is no database. The web page is encoded as an html file with a regular .html extension. When I open it in a text editor it displays normally and when I request the file in my browser directly from the origin server (via the IP address) it displays normally.

The only cache is a Cache Everything page rule. I tried purging the Cloudflare cache but the problem remains.

One of the ANSI characters that’s mysteriously replaced with the BOM is — another is ’

They’re both replaced by this same sequence of characters:

�

Hello,

https://www.i18nqa.com/debug/table-iso8859-1-vs-windows-1252.html

It’s not supposed to be converting from Windows-1252 to ISO-8859-1. Why would it?

Has it been edited via cPanel File Manager? Because the “right-click → edit” can sometimes change the default encoding set to some other rather than UTF-8.
Then we usually edit it and save it, and then the problems occur.

Moreover, is the META “charset” added and with the correct encoding to the <head>...</head> section?

Can you provide us an screenshot or even better direct URL so we can check further, if so?

Scroll to where it says “best browser version of N” to see what I’m talking about. The funny characters are right after that.

One solution I’d be happy to accept is a way to easily convert all *.html files from Windows-1252 to UTF-8. Server is running Centos 7.5 and I have ssh access to it so I can run shell and bash scripts

Yes I see exactly what you are pointing out here.

Moreover, there are also issues with HTML code as well.
See here:

I suggest if you can edit your html files, switch to HTML5 and charset UTF-8?

Or in your Web server, which one do you use?

For example, if using Apache, can you try adding AddDefaultCharset utf-8 to your .htaccess file to make your web server return the files with the encoding as the needed one.

  • or in your case windows-1252

Thank you. I temporarily disabled the Cloudflare worker I have on that page and it fixed the encoding issue. So the worker must be the culprit here. Is there anything that I can do about it? I’m guessing that JavaScript isn’t observing the content type header and is just trying to convert from UTF-8 into its native charset and that’s the problem here.

I definitely agree that UTF-8 is the right way to go. I just need to find a bash script that can convert multiple .html files.

1 Like

Hm, you have not mentioned you were using Worker before. But, could it be related to Worker because it uses JSON or, if so, the default charset for it is UTF-8?

For JavaScript files, when using specific charset, we can specify it with adding an attribute charset="windows-1252" for example to the <script ...></script> HTML tag/link to the file.

As I am not familiar so much, let’s just be patient and wait for the reply of someone with more knowledge about this.

Due to usage of the Worker as you mentioned, I suppose the UTF-8 as here - if this could be related to your issue:

Cloudflare Workers use their own internal encoding of UTF-16 so when you convert the reponse stream to text from within the JavaScript worker, it has to convert it from whatever its encoding is to UTF-16. I’m guessing it is not observing the content-type header and trying to convert to UTF-16 from what it incorrectly assumes is UTF-8. Stupid JavaScript.

Thanks for your help. If you have a bash script that converts multiple files to UTF-8 that would be the way to go.

1 Like

Not sure about it, if you need to replace characters or just windows-1252 in your .html files or something else?

Well, just an fast idea, you could use either find command or file, then iconv to convert from one to another charset and save the new converted to a new output file
find or just file to list all the .html files in a directory, and then a for loop to do that for each file you have.

All my *.html files are saved in Windows-1252 encoding and if it’s easy I’d just convert them all to UTF-8.

You could easily achieve something like this using Windows PowerShell. If you got the content for a file you could pipe this to the Out-File cmdlet specifying UTF8 as the encoding.

Try something like:

Get-ChildItem *.txt -Recurse | ForEach-Object {
$content = $_ | Get-Content

Set-Content -PassThru $_.Fullname $content -Encoding UTF8 -Force}  

Thanks. By the way I confirmed it. The Cloudflare worker ignores the charset specified in the content-type header and incorrectly assumes the body content is UTF-8 which is why it can’t decode it and ends up mangling it. I confirmed it by converting the file to UTF-8 and running it through the worker with a origin-server header of Content-Type: text/html; charset=Windows-1252. The Cloudflare worker ends up with a response body encoded in UTF-8 even though the content-type header says Windows-1252. If it had decoded it from Windows-1252 the long dash would look like — and the apostrophe would look like ’ but instead they look normal.

So it is indeed a Cloudflare issue.

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.