Gatsby site: Non-latin URLs give 404 error

I have uploaded a static site created with GatsbyJS like so https://developers.cloudflare.com/workers/sites/start-from-existing/

You can see the worker in action here:

If you click on articles in non-latin languages (e.g. Arabic, Russian and Chinese), you get a 404 error. This is despite those articles being available in the KV store. The other articles do work fine.

The same site does work without issue on Netlify, including the non-latin languages.

How do I fix this?

That will be an encoding issue. You either havent stored them properly in KV or you are not properly decoding whatever you get in the request.

Debug your code and make sure the strings match everywhere.

The key corresponds to the link/URL of the pages that 404:

But with debugging on, I get the error: could not find 10-%D0%B6%D0%B8%D0%B7%D0%BD%D0%B5%D0%BD%D0%BD%D1%8B%D1%85-%D1%83%D1%80%D0%BE%D0%BA%D0%BE%D0%B2-%D0%BA%D0%BE%D1%82%D0%BE%D1%80%D1%8B%D0%B5-%D1%8F-%D0%BF%D0%BE%D0%BB%D1%83%D1%87%D0%B8%D0%BB-%D0%B2/index.html in your content namespace

So perhaps the error happens because it looks for the encoded URL above, and I store the decoded URL.

BUT, if I store the keys, like this:

Then I still get the same error, even though in this case, the Key is identical to the one reported in the error message!

How can I change the worker script to show these files?

Precisely, there is the encoding mismatch I mentioned. You’ll need to decode the string before looking it up in KV.

Though the path you saved in KV also looks a bit of as far as I can tell. You basically need to make sure whatever you receive matches the database. Just use good old printf (console.log in “modern” JavaScript terms :slightly_smiling_face:)

Search for “URL decoding”.

I tried that before posting my reply before, but I’m probably doing it wrong.

I just added this for example:
options.mapRequestToAsset = req => {
let url = decodeURI(req.url)
return mapRequestToAsset(new Request(url, req))
}

But this doesn’t change anything. Still the same error on pages with non-latin URLs.

I really appreciate your help so far, but I can’t figure it out (yet).

You also say “Though the path you saved in KV also looks a bit of as far as I can tell.”
What do you mean by that?

Have you tried the printf approach? That should relatively quickly fix it.

I am not quite sure what mapRequestToAsset does but you seem to decode the string only to pass it to a request object where it will be re-encoded. Where do you actually access KV? Thats where you need to make sure you have the right string. If that mapRequestToAsset is some internal function it might always use the encoded string, in which case you’d need to save the string in its encoded form in KV too.

See here about mapRequestToAsset and how KV is accessed: GitHub - cloudflare/kv-asset-handler: Routes requests to KV assets

I use a method of uploading static sites provided by Cloudflare, so I didn’t actually write the worker myself. I’m just trying to alter it now to fix my issue.

If that mapRequestToAsset is some internal function it might always use the encoded string, in which case you’d need to save the string in its encoded form in KV too.

Saving the strings in their encoded forms in KV was the first thing I tried, which did not work for some reason.

Have you tried the printf approach? That should relatively quickly fix it.

Pardon my ignorance, but what do you mean with the printf approach? printf is not a Javascript function, so I’m not sure what to do with it.

What I wrote earlier, please re-read my message.

What it comes down to is that you need to debug your code and make sure the strings match. If you do not actively fetch it from the database you will most likely have to store it in its encoded fashion and I already addressed any potential issues you might have here in posting #4 as well.

Hi! KV can certainly handle encodings; I would encourage you to file a bug on Wrangler about this. Workers Sites is a Wrangler feature, and it seems maybe they’re not handling this properly.

I’d be hesitant to call this a bug of Wrangler or Cloudflare in general, to be honest.

To me this seems like either a broken encoding or as I mentioned earlier a simple mismatch between the request path and whatever is stored in the database.

I’ve been doing some “printf’ing”.

Sorry about my slowness here. I did not think of editing the imported files before.

Some things I have found:

  • In the documentation, mapRequestToAsset is recommended to alter the URL, after which you make a new Request with the altered URL. BUT Request() encodes its URL, so decoding in mapRequestToAssset is futile.
  • I then decoded the URL in kv-asset-handler itself. Now, in the previewer and playground, things work perfectly. However, on the live worker the non-latin language pages load their content for a split second, after which they switch to a botched up mixture of the post and a 404 page. (This may be due to caching on my end. I’ll test some more. EDIT: YES, see EDIT2 below.)
  • When I used encoded file names instead of decoded file names, the reason I still got the error was because the script searches for the encoded file names in upper case (all caps), while they are stored in lower case. I do not know why it would search for the upper case path, since the link goes to the lowercase version. Maybe this is even the issue behind the overall problem I’m having. Again, I edited kv-asset-handler to make the pathKey lower case.
    However, this now results in the page loading for a split second, and then going white.

EDIT: The pages changing/disappearing after a split second (when using encoded keys) has something to do with GatsbyJS. I wonder if Gatsby is somehow trying to find the non-existent keys. Loading the pages with Javascript turned off works perfectly.

EDIT2: The weird results after decoding the URL before were due to browser caching on my end. So things now work by using decoded keys and decoding the pathKey in kv-asset-handler.

@sklabnik, do any of these things sound like bugs? On one hand there is the fact that things only work if I decode the pathKey (if I use decoded filenames/keys), and on the other hand, if I use encoded filenames/keys, kv-asset-handler for some reason looks for the key in all caps, while it is stored in lowercase.

One of the things was you sent 10-жизненных-уроков-которые-я-получил-в/index.html but the key in your screenshot shows a completely different value.

As I said earlier, if you decode only to have it encoded again, there is little point in decoding it in the first place. I dont think there is anywhere a bug involved here, it simply was that encoding issue. Both approaches (encoded and decoded) might be possible, but I’d try to avoid using the encoded value wherever possible but use the proper string and make sure all the strings match.

The screenshot shows the same value, but with a hash added: 10-жизненных-уроков-которые-я-получил-в/index.[hash].html. This is default Workers Sites functionality.

The documentation says the following

Takes any path that ends in / or evaluates to an html file and appends index.html or /index.html for lookup in your Workers KV namespace.

That would essentially mean a request for 10-жизненных-уроков-которые-я-получил-в/ should be a lookup for 10-жизненных-уроков-которые-я-получил-в/index.html and that wouldnt match the KV key.

The point is you were trying to lookup 10-жизненных-уроков-которые-я-получил-в/index.html, which however didnt exist as that entry had that hash placed in the middle. I cant say where that hash came from but it would explain why that particular lookup failed even when you had the encoding right.

No.

… To fix this, on publish or preview, Wrangler walks the entry-point directory you’ve declared in your wrangler.toml and creates an asset manifest: a map of your filenames to a hash of their content. We use this asset manifest to map requests for a particular filename, say index.html, to the content hash of the most recently uploaded static asset.

You can see here how the filenames are linked to the hashed keys: https://github.com/cloudflare/kv-asset-handler/blob/0617aec9ef513efe84d0820eb5cc7fcf84c5a83a/src/index.ts

Just noticed that this issue already exists, so perhaps it will be fixed at some point: URLs with encoded characters not handled properly. · Issue #75 · cloudflare/kv-asset-handler · GitHub