Uploading files via workers

Hi,
I’m trying to implement uploading of files to Google Cloud Storage via the workers.
I built the whole process but I’m now stuck on one last thing.
Background:
We have a web app that you can upload any type of file with it to our servers.
We decided to switch from a VM server to using the GCS platform. All the requests are routed through Cloudflare, Therefore I created a route for the worker to process these requests.
I built the whole process for obtaining an access token for GCS and it uploads and creates the file in the desired bucket but this file is corrupt.

The upload originally is done with a multipart/form-data body format and I can extract the FormData object from the formData() method of the incoming request.
Then I can get(‘file’) form the FormData object I get. I believe that the problem is that in Cloudflare workers I don’t have the File object that the formData object should return when I call formData.get(‘file’) instead it returns the data as a string. I tried to add the returned object directly as the GCS requests body but as I said the resulting file is corrupt.

How can I get around this?

1 Like

Hi @noam4,

I’m afraid we don’t yet implement the File API, so FormData isn’t conformant in this area. The only workaround I can think of in the meantime is to upload the file in its own POST/PUT request via fetch() in the browser. Would that be feasible in your use case?

Harris

2 Likes

Thanks for your reply but unfortunately I can’t change the way the files are uploaded.
I’m now trying to make a rough implementation of the FormData extraction that will suit our needs.
If you have a better solution please let me know.

Thanks,
Noam

Hi @noam4,

That’s basically what I would do, too. There probably exists a pure-JS multipart/form-data parser that you could bundle into your script (e.g. with Webpack), then parse the raw data returned from await response.arrayBuffer().

I will also look into whether we can add the necessary part of the File API to support FormData correctly. It should be trivial, but if any scripts in the wild rely on the current buggy behavior then we’ll need to figure out a way to migrate them before suddenly changing the API behavior.

Harris

Any update on this?

It would be wonderful if CF could provide a standard API for this inside of CF workers, e.g. provide in the core runtime or as an otpional webassembly with Rust, e.g. https://github.com/abonander/multipart.

1 Like

I recently deployed a file uploader (to aws s3). The way I pass the data is converting the file to base64 and just put it as a string in the body

this is javascript client

$.post(url, JSON.stringify({ “filebase64” : “the base 64 data”, “filename” : “some file name” }), function®{})

the worker code

const body = await request.json();
var buf = Buffer.from(body.filebase64,‘base64’); //byte array

then you can upload the bytearray to s3 or google or azure etc…

Is there any update to this to add support file API and embedding that into FormData? @harris Thanks!

Hi there, while File API support for FormData is still on our radar, I’m afraid I don’t have anything new to report yet.

Understood. Thanks!

Feel free to use this package that I’ve built to solve this problem: https://github.com/ssttevee/js-cfw-formdata-polyfill

4 Likes

Thank you so much @ssttevee ! I really appreciate it!

@ssttevee It seems that your code has some defect, because I’m getting incorrect hash after I parse the data. Here is what I do in the worker:

import '@ssttevee/cfw-formdata-polyfill';
import { FileReaderSync } from '@ssttevee/blob-ponyfill';
const crypto = require('crypto');

 const fd = await request.formData();

 const [[name, blob, filename]] = fd.entries();

 const data = new FileReaderSync().readAsBinaryString(blob);

 const hash = crypto.createHash('md5').update(data).digest("hex")

If I compare the result hash with one computed on my machine, they differ :confused:

Do you have some tests for the blob implementation? How do I verify that the parsing is correct?

Btw, thanks for the libs, I appreciate the effort, but sadly it doesn’t work for me. Maybe I’m doing something wrong?

I wouldn’t doubt that there may be all correct, but it worked for my purposes, so I left it at that.

From your snippet, it looks like you’re using this from nodejs. Can you tell me more about the environment? Are you using @dollarshaveclud/cloudworker, just plain express with node-fetch, or something else?

Also, I think it’d be better for make an issue about it on the github repo instead of going off topic here.