Use Cache API to put 260MB file with .db extension

Hi,

I’m trying to use the “put” method on the cloudflare cache api from within a cloudflare worker script to put a ~260MB sqlite database that has a .db file extension into the cloudflare cache and subsequently use the “match” method to retrieve it. No matter what I try, the “match” call never finds the previously stored file.

If I use the same code to store a .jpg or a .png it works as expected (I get a cache hit using match) however if I try changing the file extension of the cache key on the db file, the match still fails :frowning:

I’m not sure if there is a size limit on the cache api put method (although when I wrapped the put part of the process in a try/catch no error was thrown)?

I also tried using a cache key that is a request object set up as:

const cacheKey = new Request(cacheUrl.toString(), { cf: {cacheEverything: true}});

This didn’t work either, perhapes there is another magic request/response header I should include when putting the file into the cache… ?

Any pointers would be much appreciated.

Thanks, Andy

1 Like

The Cache API limits are the same for the Free and Bundle plans (512 MB).

Reasons .match is not retrieving anything:

  1. there is nothing in cache
  2. the cacheKey is not consistent

Thanks for replying. I’m confident the cache key is consistent (I’m using the same string to push into the cache and retrieve it and when I use the same code to cache/retrieve a much smaller jpg or png file I get a cache hit).

Is there a way to see if a file is in the cache (under the expected key) other than using match?

Check the worker CPU consumption, you might have run out of resources during the put.

Thats an interesting suggestion, thanks (I hadn’t been thinking along those lines at all).

The dash console shows 0 “exceeded resources” errors, so I don’t think that’s the explanation.

jpg, png and many other extensions are cached by default, so they will get you a cache hit.

I never found anything in the documentation indicating that resources retrieved using caches.default.match include a header with cf-cache-status, I think this is the expected behavior unless someone from Cloudflare tells me otherwise.

To test if the cache is working, you’ll need to do something like this:

const cacheTest = async (event) => {
	const {request, request: {url}} = event;
	const cacheUrl = new URL(url)
	const cacheKey = new Request(cacheUrl.toString(), request)
	const cache = caches.default

	let cachedResponse = await cache.match(cacheKey)

	if (!cachedResponse) {

		const response = new Response('content from origin')
		response.headers.append('Cache-Control', 's-maxage=10')
		event.waitUntil(cache.put(cacheKey, response.clone()))
		return response
	}
	else
	{
		return new Response('content from cache')
	}
}

Apologies for randomly going quiet, a list of higher priority things cropped up :frowning:

My code is pretty much doing the same as the snippet above and it works as expected for smaller jpg and pdf files. Interestingly it also works for a smaller (~60MB) db file so the lack of caching must be something to do with the the file size or the time its taking to pull from S3.

This code runs in a “Pro” Cf plan, so I’ll try raising a support ticket.

This might be the same issue as here:

https://community.cloudflare.com/t/authorization-header-causes-cf-cache-status-bypass-regardless-of-cacheeverything/

Loading the data into RAM will exhaust it after 128MB (the allotted amount of RAM per request), you’ll have to stream it into the cache.

I spotted that post re the Auth header too, I don’t think this is the cause of my problem as the code that is pushing the file to be cached doesn’t include an auth header (and the caching works for smaller files).

The actual code is currenlty:

async function handleDataSetFileRequest(event) {

const originalRequest = event.request;
const url = new URL(event.request.url);

let filePath = getParameterByName(url, API_METHOD_QUERY_PARAM_FILE_PATH);

console.log("handleDataSetFileRequest - pathname: " + url.pathname);

// move to using full data file in url path rather than in query string so browser can cache response
if ((!filePath) && (url.pathname.length > API_METHOD_PATH_FETCH_DATASET_FILE.length)) { 
    filePath = url.pathname.substring(API_METHOD_PATH_FETCH_DATASET_FILE.length);
}

console.log("handleDataSetFileRequest - request for: " + filePath);

let modResponse = null;

if (filePath) {

    let cacheUrl = new URL("https://" + DATASET_S3_HOSTNAME + filePath);

    const cacheKey = cacheUrl.toString()

    // try to find file in cloudflare cdn cache
    const cfCdnCache = caches.default;
    let response = await cfCdnCache.match(cacheKey);
    
    if (!response) {

    console.log("handleDataSetFileRequest - cf cache miss for: " + cacheKey );

    // didn't find file in cdn cache, attempt to retrieve from dataset S3 bucket

    const aws = new AwsClient({
        accessKeyId: DATASET_S3_AWS_ACCESS_KEY_ID,
        secretAccessKey: DATASET_S3_AWS_SECRET_KEY,
        region: DATASET_S3_REGION,
    });

    response = await aws.fetch(cacheUrl);
    console.log("handleDataSetFileRequest - headers in response to fetch from AWS for: " + cacheUrl.toString() 
        + " are: " + buildHeadersLogString(response.headers));

    modResponse = new Response(response.body, {
        headers: {
        ...corsHeaders,
        "Cache-Control": "public, max-age=259200"
        }
    });

    console.log("handleDataSetFileRequest - cf cache miss for: " + cacheKey + ", mod response header: " 
        + buildHeadersLogString(modResponse.headers));

    // put file into CDN cache in background (allow client request thread to return then keep worker running until put completes)
    event.waitUntil(cfCdnCache.put(cacheKey, modResponse.clone()));

    } else {

    modResponse = response;

    console.log("handleDataSetFileRequest - cf cache hit for: " + cacheKey + ", response cache header: " 
        + buildHeadersLogString(modResponse.headers));
    
    }

    return modResponse;
}

return new Response('No file-path query string parameter provided', {
    headers: { 'content-type': 'text/plain' },
    status: 500,
});
}

Cheers, Andy

You’re loading the whole file into a variable right there, this will consume the RAM.

Aaah, because of the await (or is the whole body of the response being read because I’ve assing to a variable)… ?

I’ll have a look at that in a bit (hopefully cloudflare support are taking a look at this at the moment so I won’t change the code right now).

If I’m hitting a RAM limit, would it be reasonable to expect the workers web console to show that?

(I ask because I’ve assumed that if I have a workers resource problem the “exceeded resources” count in the invocation statues panel of the workers dashboard would get incremented - at the moment it shows zero)

Yes because of the await. The fetch body is by default a stream but if you’re awaiting it then the request is waiting until the response is streamed into the variable (Not 100% certain of internals but that that’s how it seem to work).

Since you’re completing the request by waitUntil, the request has finished, but yeah it should show an exceeded resources error in the logs.

Try this code:


const originalRequest = event.request;
const url = new URL(event.request.url);

let filePath = getParameterByName(url, API_METHOD_QUERY_PARAM_FILE_PATH);

if ((!filePath) && (url.pathname.length > API_METHOD_PATH_FETCH_DATASET_FILE.length)) { 
    filePath = url.pathname.substring(API_METHOD_PATH_FETCH_DATASET_FILE.length);
}

let modResponse = null;

if (filePath) {

    let cacheUrl = new URL("https://" + DATASET_S3_HOSTNAME + filePath);

    const cacheKey = cacheUrl.toString()

    // try to find file in cloudflare cdn cache
    const cfCdnCache = caches.default;
    let response = await cfCdnCache.match(cacheKey);
    
    if (!response) {

    const aws = new AwsClient({
        accessKeyId: DATASET_S3_AWS_ACCESS_KEY_ID,
        secretAccessKey: DATASET_S3_AWS_SECRET_KEY,
        region: DATASET_S3_REGION,
    });

    response = await aws.fetch(cacheUrl);
    
	let { readable, writable } = new TransformStream()	
	response.body.pipeTo(writable)
	
	
	//The output of tee() looks like this [ReadableStream, ReadableStream]. 
	//Each of those streams receives the same incoming data.	
	const branches = readable.tee()
	
	/*
	To prevent running out of CPU time and RAM I think it is better not to read the Response object and:
		1. declare the headers manually (probably based on file extension)
		2. force 200 status (cache.put does not accepts 206 Partial Content)
		
	Important: cache.put will store the Response the moment the stream is completed. That's the way tee() works.
	*/
	
	event.waitUntil(cfCdnCache.put(cacheKey, new Response(branches[0], {
		status: 200,
		headers: {
			...corsHeaders,
			'Cache-Control': 's-maxage=259200'
		}
	})));

	return new Response(branches[1])

    }
	else
	{
		return new Response('from cache')
		//return response	
	}
}

return new Response('No file-path query string parameter provided', {
    headers: { 'content-type': 'text/plain' },
    status: 500,
});
}

The original code worked up to 196.98 MB, but I didn´t have any larger example to test it, so please tell me if it works for you.

The code simultaneously returns and puts the response in cache, so you need to wait until the file is downloaded to be able to serve it later from the cache.

Tee() method forces us to return any of its branches to be able to store the other branch in the cache.

Hi escribeme,

yes, that code works perfectly, even with the large db file (which is now up to ~270MB) :slight_smile:

I can see the header “CF-Cache-Status: HIT” coming back from a couple of Cf colos near me.

(it works even better when its modified to return the contents of a cache hit, rather than ‘from cache’ - in case anyone tries a blind cut-n-paste and is suprised by the result)

I will go and read up on TransformStream, pipeTo and tee - they look very useful.

Thank you for your help (also thanks to thomas4).

1 Like

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.