Worker JS to determine MIME type from contents (first 4 bytes)

All,

I’ve been having some fun serving some basic static site files from Workers with KV. :slight_smile: . The one sticking point I’ve run into is properly setting the Content-Type based on the object retrieved from the KV store. For example, retrieving https://site.example.com/favicon.ico is actually Content-Type: image/png. What I currently do, because the site is small, is use an extra KV to record the MIME type. So, for the above GET request:

    let key = `${host}${path}`
    let typekey = `content-type/${key}`

    let object = await STATIC.get(key)
    let type = await STATIC.get(typekey)

    /* error handling omitted for clarity */
    return new Response(object, {
        status: 200,
        headers: {
            "Content-Type": type,
        }
    })

Does anyone know of a lightweight JS way to determine the file MIME type, not from the extension, but from the magic bytes (first four bytes of the file)? This would be exactly like the Unix command file -I favicon.ico does it.

Thanks!

1 Like

Looks like it’s possible if you’re retrieving files from KV with the arrayBuffer type since .slice() is implemented and doesn’t modyify the underlying arrayBuffer. I just tried this (this is typescript):

let arrayBufferValue = await KV_NAMESPACE.get(filePath, "arrayBuffer");
...
let magicBytes: string[] = [];
(new Uint8Array(arrayBufferValue.slice(0, 4))).forEach(byte => {
    magicBytes.push(byte.toString(16))
})
customHeaders["content-type"] = this.getMimetype(magicBytes.join('').toUpperCase());
let resp = new Response(arrayBufferValue, {
    headers: customHeaders
});
return resp;

this.getMimetype is the same function from that blog post.

image

https://worker.judge.sh/tesla.jpg

Of course, this depends on your method of uploading the file into KV. My repo does a simple fs.readFile(filePath, {encoding: null}) when it posts files to the API, so the key itself is entirely the file.

And, at least with my method, you still need to resort to file extension inference if it’s any sort of text file. Here’s three different CSS files with a console.log(signature):

image

Finally, based on https://en.wikipedia.org/wiki/List_of_file_signatures, the first 4 bytes aren’t always completely telling (some containers and extensions have a different # of bytes than others) and it also looks like there can be some conflicts if you don’t read past the first 4, such as with WAV and AVI:

image

This can be solved with some rigorous code that handles everything, but in cases where you’re dealing with well-defined files that have correct extensions, it’s easier to infer from the extension.

Commit for reference, don’t think i’ll personally be using this detection method but it was fun testing.

@Judge: Awesome! That a great piece of hacking. Unfortunately, it makes it clear that JS was never intended for bit-banging. :frowning: .

For the curious, who might have a strong use-case for this, Rust has a wrapper crate for libmagic which is the code under the hood of the file command.

https://docs.rs/magic/0.12.2/magic/

The source repo for the crate is here:

https://github.com/robo9k/rust-magic

For our purposes, one would need to set MIME flag in the cookie. If that crate could be compiled to a WASM binary, then you’d have a complete solution.

For my purposes, though, the overhead of the second STATIC.get(typekey) is acceptable compared to rolling a whole WASM binary. It’s unfortunate that JS doesn’t have native support for such a core capability.

1 Like