Bypass particular content from page


#1

Hello,

I have a question and it would be great if you reply it.
I need to cache whole page except my breadcrumbs. whenever user visit the page, it would come from cache but breadcrumbs would come dynamically not from cache. Is it possible in Worker ?

Thank You


#2

Hi @chandresh.rana,

Yes, you could do that. The trick would be to serve your pages with a placeholder where your breadcrumbs would go, like “$BREADCRUMBS$”. Then, in your Worker, you would make two requests: one for the base page, and one for the breadcrumbs. You’d then perform a search-and-replace on the base page’s content to replace “$BREADCRUMBS$” with the content from the second request.

I believe @harris is working on a recipe showing how to do this is a streaming way, maybe he could add that here.


#3

@chandresh.rana, if the main page is not too large, it would be clearest and potentially most efficient to do this the way that @KentonVarda described: read the entire main page and do a search and replace. I say potentially most efficient because if you have multiple dynamic assets that you want to replace (more than just breadcrumbs), then you’ll need to make multiple extra subrequests – it would be best if those subrequests were made in parallel, which would require some sort of body buffering on the main page. With unbuffered response streaming, you’d end up with serialized calls to fetch(), which would increase the overall latency of the request.

On the other hand, if you just have a single dynamic asset to fetch (breadcrumbs), and/or the main page is enormous, then response streaming might be the more efficient way to go. I’m including an example below. Note that instead of matching templates such as $BREADCRUMBS$, this matches templates such as {BREADCRUMBS}, as that’s just how I originally wrote this script. It should be possible to hack it to accept different formats, though.

addEventListener("fetch", event => {
  event.respondWith(handle(event.request))
})

async function handle(request) {
  // Fetch from origin server.
  let response = await fetch(request)

  // Make sure we only modify text, not images.
  let type = response.headers.get("Content-Type") || ""
  if (!type.startsWith("text/")) {
    // Not text. Don't modify.
    return response
  }

  // Create a pipe. The readable side will become our
  // new response body.
  let { readable, writable } = new TransformStream()

  // Start processing the body. NOTE: No await!
  streamTransformBody(response.body, writable)

  // ... and create our Response while that's running.
  return new Response(readable, response)
}

// A map of template keys to URLs.
const templateMap = {
  "BREADCRUMBS": "https://example.com/bread-crumbs"
}

async function translate(chunks) {
  const decoder = new TextDecoder()
  const encoder = new TextEncoder()

  // Our chunks are in UTF-8, so we need to decode them before
  // looking them up in our template map. TextDecoder's streaming
  // API makes this easy to perform in a reduction.
  let templateKey = chunks.reduce(
      (accumulator, chunk) =>
          accumulator + decoder.decode(chunk, { stream: true }),
      "")

  // We need one last call to decoder.decode() to flush
  // decoder's buffer. If there's anything left in there, it'll
  // come out as Unicode replacement characters.
  templateKey += decoder.decode()

  if (!templateMap.hasOwnProperty(templateKey)) {
    // We encountered a template key we weren't expecting.
    // Just leave its place in the document blank.
    return new Uint8Array(0)
  }

  // We're expecting this template key and know where to find
  // its resource.
  let response = await fetch(templateMap[templateKey])
  return response.arrayBuffer()
}

async function streamTransformBody(readable, writable) {
  const leftBrace = '{'.charCodeAt(0)
  const rightBrace = '}'.charCodeAt(0)

  let reader = readable.getReader()
  let writer = writable.getWriter()

  // We need to track our state outside the loop in case we
  // encounter a template that crosses a chunk boundary.
  // Instead of tracking a separate inTemplate boolean, we can
  // use the nullity of templateChunks to signal whether we're
  // currently in a template.
  let templateChunks = null

  while (true) {
    let { done, value } = await reader.read()
    if (done) break

    // Each chunk may have zero or more templates, so we'll
    // need to loop until we're done processing this chunk.
    while (value.byteLength > 0) {
      if (templateChunks) {
        // We're in the middle of a template. Search for the
        // terminal brace.
        let end = value.indexOf(rightBrace)
        if (end === -1) {
          // This entire chunk is part of a template. No further
          // processing of this chunk is necessary.
          templateChunks.push(value)
          break
        } else {
          // We found the termination of a template.
          templateChunks.push(value.subarray(0, end))

          // Now that we have one complete template, translate it.
          await writer.write(await translate(templateChunks))
          templateChunks = null

          value = value.subarray(end + 1)
        }
      }

      // We're not currently in a template. Search for the
      // initial brace.
      let start = value.indexOf(leftBrace)
      if (start === -1) {
        // This entire chunk is template-free. We can write
        // it and go straight to reading the next one.
        await writer.write(value)
        break
      } else {
        // We found the start of a template -- write the
        // chunk up to that point, then continue processing
        // the rest of the chunk.
        await writer.write(value.subarray(0, start))
        value = value.subarray(start + 1)
        templateChunks = []
      }
    }
  }

  // NOTE: If templateChunks is non-null at this point, we
  //   encountered an unterminated template. This may or may
  //   not be a problem, depending on your use case.

  await writer.close()
}

As you can see, it’s rather involved since we have to account for cases where a template is split on a chunk boundary. If you need help modifying it, or have more questions, let us know.

Harris


#4

Hello @KentonVarda, @harris

Thank you so much for your quick response. @harris I will go through it and will let you know if i would have any doubt.

Thanks again.


#5

Hello @KentonVarda, @harris

I have used provided code and instructions and came to know that we need to exclude a second request(breadcrumbs) from Cloudflare’s caching using Page Rules. Correct me if i am wrong.

Thanks.


#6

Hi @chandresh.rana, you are correct, you’ll need to exclude the breadcrumbs asset from being cached by either setting a Page Rule or by setting the Cache-Control header sent by the origin to “no-cache”. This page describes both options:

Harris


#7

Hello @harris,

Thank you for your reply. Here i have bypass the breadcrumb request from cache using workers.

addEventListener(‘fetch’, event => {
let request = event.request
if(request.url == “https://example.com/bread-crumbs”){
let newHeaders = new Headers(request.headers)
newHeaders.set(‘Cache-Control’, ‘no-cache’)
event.respondWith(fetch(request, {headers: newHeaders}))
}
})

Let me know above code is right if not then show me the right way but i need to bypass breadcrumb request through workers and yes i also need to bypass other request from cache using workers so it will be great if you provide the best way.

Thank you.


#8

Hi @chandresh.rana,

Actually, it’s ineffective to set the Cache-Control header on a request in a worker, because Cloudflare only honors Cache-Control headers on responses, not requests. (If we supported them on requests, then malicious clients could bypass our cache, which would be a security hole.) Conversely, a worker cannot effectively set them on responses, because the worker sits between the client and the cache.

Instead, you’ll need to either set the Cache-Control header in responses at your origin server, or use a Page Rule. If you don’t have such fine-grained control over your origin server, then setting a Page Rule for that particular URL would be simplest.

Harris


How do workers combine with page rules?
#9

Hello @harris,

Did you mean i need to use below tag to bypass the particular page from cloudflare cache.

<meta http-equiv="Cache-control" content="No-Cache" />


#10

Hi @chandresh.rana,

No, you’ll need to set that header in the HTTP response itself. The meta tag version will only affect the browser which receives the response – Cloudflare’s caching proxy does not itself parse HTML, so it can’t see the header this way.

To set the header correctly, you’ll need to consult your HTTP server’s manual and/or research how others have done it. For instance, if you use Apache, you may be able to adapt this example to your use case: https://serverfault.com/questions/157589/apache-no-cache-on-specific-files

Harris


#11

Hello @harris just wanted to say thanks for the example you posted here; it was very useful and quite timely!


#12

My pleasure, @gork!