Method to decode gzip or flag to prevent double-compression (lambda)

I’m using Cloudflare Workers to proxy requests from AWS Lambda. Lambda returns resposnse with isBase64Encoded: true and body encoded in gzip and then base64. I’m using atob to decode base64encoding, but response is still gzip encoded.

Here are my questions:

  1. Is it possible to efficiently decode gzip in cloudflare workers
  2. Is it possible to tell cloudflare to skip encoding body and just leave content-encoding and body intact? If I return gzipped response straight from Lambda and content-encoding: gzip then cloudflare changes this to content-encoding: bz and compresses it again instead…
1 Like

Hi @sheerun,

There’s no efficient API in Workers to explicitly un-gzip data. One could do so by bundling a pure JS implementation of a gzip decompressor, but this would only work for small files before exceeding the CPU limits.

Regarding Content-Encoding – do you mean you’re seeing Content-Encoding: br (not bz)? If so, perhaps you have Brotli encoding turned on for your site. You can find the toggle on the Speed → Optimization tab.

Harris

ad. 2: It turned out I can set Cache-control: no-transform header to prevent cloudflare from forcing double encoding. Turning of Broccoli doesn’t do anything as Cloudflare still double-encrpts response in gzip format. There is still issue with “Content-Encoding” being removed by Cloudflare when streaming response is used…

ad. 1: Cloudflare advertises in one off the tutorials that it is able to proxy AWS Lambda calls. A clear way to handle base64 decoded responses would be helpful. I’ve spent some time to research this but I cannot find proper solution.

@harris Do you mean it’s both impossible to decompress gzip and leave gzip body untouched by cloudflare? Forwarding requests to Lambda is common use case, the same way a common use case is keeping already encoded body intact… Could you fix this?

No, neither is impossible.

If the response body has a Content-Encoding: gzip header, then the Workers runtime will decompress the body for you automatically when the worker reads it. If you also want to leave the response body untouched for the eyeball, then you’d have to clone the response before reading it (e.g., let body = await response.clone().arrayBuffer()), which would allow you to return the original response body untouched, as long as you keep the Content-Encoding: gzip header in tact.

If the response doesn’t have a Content-Encoding: gzip header, but you have some gzipped data that you need to decompress, then you could try bundling one of the pure JS gzip implementations that exist. I’ve never used one, so I’m not sure which one might be most compatible with Cloudflare Workers, but I’d probably look at pako to start. The limitation here is that the worker will end up using more CPU time than in the previous Content-Encoding: gzip case, so you may find that you can’t handle large files successfully.

Does one of those suggestions cover your use case?

Harris

2 Likes

@harris The standard http-compatible binary response from Lambda is JSON which includes fields like: http code, list of headers, body etc. In case of binary response body contains base64 encoded gzip response in “body” JSON field (i.e. response → gzip → base64), and there’s additional isBase64Encoded: true field present to tell that response is binary. Please see for reference: Set up Lambda proxy integrations in API Gateway - Amazon API Gateway

If the response body has a Content-Encoding: gzip header, then the Workers runtime will decompress the body for you automatically when the worker reads it.

This won’t work because lambda returns JSON that has gzipped, base64 encoded field “body”, and it doesn’t encode JSON itself as base64. In particular there’s no Content-Encoding: gzip header in response. Worker’s job is to extract “body” field from JSON, de-base64 it, and return it as response body (this body can be gzip compressed but as well it can be any binary compression like broccoli).

If the response doesn’t have a Content-Encoding: gzip header, but you have some gzipped data that you need to decompress, then you could try bundling one of the pure JS gzip implementations that exist.

As you noticed it’s not feasible because un-gzipping is very computationally expensive operation if done by 3rd party library. While Cloudflare provides web-standard atob function which can be used to decode mentioned base64 field very quickly, it doesn’t provide similar method for un-gzipping. Even worse: it doesn’t allow worker to just return base64 decoded gzipped body as response. Instead cloudflare double-encrypts it. Additionally response needs not to be gzipped but can be e.g. compressed with broccoli.

Could I kindly ask you to check why Cloudflare doesn’t respect Cache-control: no-transform which in theory should prevent transforming body returned by worker? Is it by design, or is it a bug? If it’s by design could you allow some flag that disables all body transforms for binary body returned by worker?

As mentioned this isn’t just Lambda issue, but issue with returning binary responses by Cloudflare Workers in general (e.g. keeping gzipped content in cache and just returning it).

Hi @sheerun, sorry for my slow reply here. I appreciate the detailed response you gave – it helps a lot!

I agree that we should ideally have a compression API that could perform gzip compression/decompression natively. I’ve passed along the feature request, though I can’t guarantee any timeline for if/when it might be implemented.

Regarding the Cache-Control: no-transform issue: does the response returned from the worker have a Content-Encoding: gzip header? If it does, then the Workers runtime will gzip the response body regardless of the presence of any Cache-Control: no-transform header. Is that what’s happening? If not, I think I’ll need a minimal example that reproduces the problem to investigate further – if the response has no Content-Encoding header, then we shouldn’t be automatically adding one, or mutating the body.

Harris

@harris But that’s exact issue I’m having. If worker returns gzipped content from cache or other source like lambda response that requires Content-Encoding: gzip header, then Cloudflare double-gunzips it or something. Here’s worker to reproduce:

addEventListener('fetch', event => {
  event.respondWith(handleRequest(event.request))
})

const hello = new Uint8Array([0x68, 0x65, 0x6c, 0x6c, 0x6f])

const compressedHello = new Uint8Array([
  0x61, 0x0d, 0x0a, 0x1f, 0x8b, 0x08, 0x00, 0x00, 0x00, 0x00, 0x00,
  0x00, 0x13, 0x0d, 0x0a, 0x66, 0x0d, 0x0a, 0xcb, 0x48, 0xcd, 0xc9,
  0xc9, 0x07, 0x00, 0x86, 0xa6, 0x10, 0x36, 0x05, 0x00, 0x00, 0x00,
  0x0d, 0x0a, 0x30, 0x0d, 0x0a, 0x0d, 0x0a
])

async function handleRequest(request) {
  const pathname = new URL(request.url).pathname
  if (pathname === '/compressed') {
    const response = new Response(compressedHello)
    response.headers.set('Cache-Control', 'no-transform')
    response.headers.set('Transfer-Encoding', 'chunked')
    response.headers.set('Content-Encoding', 'gzip')
    response.headers.set('X-Compressed', 'yes')
    return response
  }

  if (pathname === '/plain') {
    const response = new Response(hello)
    response.headers.set('X-Compressed', 'no')
    response.headers.set('Cache-Control', 'no-transform')
    return response
  }

  return new Response('ok')
}

Both hello and compressedHello could be stored in cache. hello is plain response and compressedHello is “hello” as gzip compressed and chunked response. compressedHello response must have Content-Encoding: gzip for curl to interpret such response properly, so I cannot just skip it.

As you see binary data of plain “hello” is returned 1:1:

curl https://test.sheerun.workers.dev/plain --http1.1 --raw --silent | xxd -p -l 50  | fold -w2 | while read b; do echo 0x$b,; done | tr "\n" " "

0x68, 0x65, 0x6c, 0x6c, 0x6f

But cached compressed response is modified by Cloudflare…

curl https://test.sheerun.workers.dev/compressed --http1.1 --raw --silent | xxd -p -l 50  | fold -w2 | while read b; do echo 0x$b,; done | tr "\n" " "

0x32, 0x38, 0x0d, 0x0a, 0x61, 0x0d, 0x0a, 0x1f, 0x8b, 0x08, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x13, 0x0d, 0x0a, 0x66, 0x0d, 0x0a, 0xcb, 0x48, 0xcd, 0xc9, 0xc9, 0x07, 0x00, 0x86, 0xa6, 0x10, 0x36, 0x05, 0x00, 0x00, 0x00, 0x0d, 0x0a, 0x30, 0x0d, 0x0a, 0x0d, 0x0a, 0x0d, 0x0a, 0x30, 0x0d, 0x0a, 0x0d

1 Like

Hi @sheerun, I understand now. I can explain why this is happening, and why Cache-Control: no-transform does not change the behavior, but I’m afraid I don’t have any tricks to circumvent it beyond what I’ve already covered (either waiting for a decompression API to be implemented or bundling a pure-JS decompressor).

The issue arises because the Fetch API assumes all body arguments to the Response constructor are plaintext without any content coding applied, and provides no method for specifying otherwise.

Consider the following response body mutation worker – a common pattern. It replaces a string with another string, then reconstructs the response, preserving the original headers.

async function handle(request) {
  let response = await fetch(request)
  if (response.body) {
    let body = await response.text()
    // Pass `response` as the init argument to preserve headers.
    response = new Response(body.replace(/foo/g, "bar"), response)
  }
  return response
}

If the origin server provides a response with Content-Encoding: gzip, then await response.text() decompresses it – per the Fetch spec – and the worker sees plaintext. When the worker reconstructs the response with new Response(...), it provides that mutated plaintext back to the constructor, along with any headers that were on the original response – including any Content-Encoding header. Our options as implementers would then be:

  1. Remove the Content-Encoding header and return a plaintext response verbatim.
  2. Ignore the Content-Encoding header and return a corrupt response.
  3. Honor the Content-Encoding header and compress the body to return a gzipped response.

The Service Worker and Fetch specifications are silent on what to do here – presumably because browser service workers have no need to transmit responses from a service worker over the network. So we have to make a judgment call. Our rationale is that option 1 would be highly unpopular and option 2 obviously violates RFC 7231 (HTTP), so we implemented option 3. This is the source of the double-compression that you are seeing.

Now, say the origin server also provides a Cache-Control: no-transform header, and it gets passed through to the Response constructor. This doesn’t fundamentally change the situation. As far as the Response constructor is aware, the body is still passed in as plaintext, so we still have the same choices to make as before. Option 3 remains the only viable choice. Moreover, if the Response constructor changed its behavior based on that header, we’d only end up breaking scripts such as the example above, requiring script authors to perform more special-casing of responses.

(Note that the script itself would be violating the no-transform directive by mutating the body, but that’s up to the script.)

So, that’s why you’re seeing the behavior you’re seeing. I hope it makes some sense.

Harris

@harris It makes total sense for the fetch.text() use case you’re describing. Also it seems that fetch API disallows accessing original compressed response.

This unfortunately leaves two use cases I’ve described unresolved (returning embedded compressed response from Lambda, and returning compressed response from cache). I hope you could provide some flag to enable these two use cases.

Basically Cloudflare somehow needs to know that returned body has not been previously decompressed and is encoded as Content-Encoding of headers. I think a good solution would be to allow compressed: true flag for Response as so:

  const response = new Response(compressedHello, { compressed: true })
  response.headers.set('Transfer-Encoding', 'chunked')
  response.headers.set('Content-Encoding', 'gzip')
  return response

If this option is present, Cloudflare would always return body 1:1 without compressing. I believe introducing method for decompressing gzip would be inferior to this solution as:

  1. There can be different encodings than gzip like broccoli or even custom ones
  2. Decompression in Cloudflare Workers followed by re-compression by Cloudflare is unnecessary step if clients expects compressed response
1 Like

If the origin server provides a response with Content-Encoding: gzip , then await response.text() decompresses it – per the Fetch spec

I’m not witnessing this behaviour at all. Not sure if I’m misunderstanding but shouldn’t this prevent double encoding?

const body = await response.text(); // contains content-encoding: 'gzip' and supposedly decompress gzip
return new Response(body, {
  headers: new Headers({
    'content-encoding': 'gzip'
  })
});

The snippet you posted would work like this: origin server sends a gzipped response, body is plaintext, and the eyeball sees the re-gzipped response. Is that not the behavior you’re seeing?

Harris

if I extend the example provided by sheerun:

if (pathname === '/compressed') {
  const response = new Response(compressedHello)
  response.headers.set('Cache-Control', 'no-transform')
  response.headers.set('Transfer-Encoding', 'chunked')
  response.headers.set('Content-Encoding', 'gzip')
  response.headers.set('X-Compressed', 'yes')
  const text = await response.text();
  console.log(text)
  return response
}

console.log(text) doesn’t return uncompressed text:

�
f
�H���a��6
0

Also the same behaviour in Chrome Developer Tools as well

From the perspective of the Response constructor, compressedHello is plaintext. Adding a Content-Encoding: gzip header asks the Response object to compress the “plaintext” (which would, in this case, result in double-compression) when serializing the body for transport over HTTP. However, response.text() and similar functions don’t serialize the body for transport, but rather they read the plaintext value of the body. compressedHello is already plaintext (as far as the Response object understands), so that’s what is returned.

I suppose we can summarize the behavior with a couple rules:

  • When a Response is constructed by the script, body consumption functions (response.text(), response.arrayBuffer(), response.body.getReader().read(), etc.) return the same data as the Response was constructed with.
  • When a Response is constructed by the runtime by deserializing an HTTP response, body consumption functions return the HTTP response body with any Content-Encoding removed.

I’m not sure this is the expected behavior.

Adding a Content-Encoding: gzip header is not supposed to trigger a compression routine.

It is just an information for the client : “hey, this is compressed content”.

This is why developers are confused.

That said, from what I understand, the fix is :

// Fetch from Origin Storage, then cache if no error
const fetchFromOrigin = async (event) => {
  const k = 'my-asset.txt';
  const storageResponse = await fetch(`https://my-storage.com/${k}.gz`);
  if (!storageResponse.ok) { return new Response('Not Found', { status:404 }); }
  const response = await fix(storageResponse);
  event.waitUntil(caches.default.put(new Request(k), response.clone()));
  return response;
}

const fix = async (storageResponse) => {
  if (storageResponse.url.match(/\.(gz|br)$/)) {
    const originalBody = await storageResponse.clone().arrayBuffer();
    return new Response(originalBody, storageResponse);
  }
  return new Response(storageResponse.body, storageResponse);
}

Also, for the record, compression streams are now available in Chrome :

https://chromestatus.com/feature/5855937971617792
https://wicg.github.io/compression/#examples

Compression streams do not seem to be supported in cloudflare workers yet. I get

Uncaught (in promise) ReferenceError: DecompressionStream is not defined

in the cloudflare worker when i try to create a DecompressionStream (it works fine in chrome). Would be nice to know if cloudflare plan to support this?

See Response · Cloudflare Workers docs encodeBody

1 Like

For sake of others, as of December 2023 I’m using DecompressionStream successfully in a Cloudflare Worker.