How to prevent etag ddos/abuse?

Hi!

I am in the process of learning about caches and etags and, in my journey to understand how to do things properly, I wanted to ask about how to prevent abuse/ddos with etags.

Just for my understanding, I would like to put a very simple example and ask how to prevent abuse, bare with me for a moment :smiley:

Example:

1.- Imagine a client that makes a GET request to end point /my/data/, it is the first time for the client, so it does not send any etag related headers with it (if-modified…, etc.).

2.- The server receives that request to /my/data/, sees that there is no etag related header, and returns this data as json: {“data” : 1} with the ETag: “1”.

3.- The client receives data as json: {“data” : 1}, and the ETag: “1”

4.- The client wants to request again the same resource, so, now, it makes a GET to /my/data/ with the header “if-modified”: “1” because it already has requested it on the past.

5.- The server sees the header “if-modified”: “1”, compares it with the current version of the data (which we are going to assume its still “1”). The server returns response 304 (not modified) to tell the client that it already has the correct asset associated with that “1”. Otherwise, if the version of the data were “2”, the server would have sent back, for example data: {“data”: 2} and ETag: “2”.

Ok, this is a very simple example, but here some question on how to prevent abuse:

  • If per every request done with GET to /my/data/ the server needs to check the ETag (maybe in the database), which is the point of having the cache/ETag system that, per every request, we end up hitting the DDBB ?

  • Should we assume that the ETag needs to be available on a very fast way (maybe stored in RAM and not in the DDBB) and not to use a lot of CPU on computing it ?

  • If the client purposely malforms the ETag value to be always an older version of it to force the server to send back always the data, is the only solution to this behavior to encrypt the ETag to “try” to offer a bit more of security (although having a copy of the old ETag is sufficient to bypass the encryption) ?

  • Is ETag solution only safe if both Server and Client act in good faith ?

Thanks for your time and help :smiley:

Not an expert, but I don’t think it matters for etags when it comes to proxy cached resources on Cloudflare. I tested using curl to providing a false etag and as long as the resource/asset is in Cloudflare caches, the data transfer for the body for an invalidated etag is going to visitor/client from Cloudflare Edge cache served with cf-cache-status = HIT and not origin as I inspected my origin logs and there are no requests.

valid etag sent with 304 status response for cache HIT and size = 0

curl -skD - -H 'If-None-Match: "5f98b1b9-10da"' https://blog.domain.com/wp-content/uploads/2020/10/image.webp -w "Size: %{size_download}\n" -o /dev/null 
HTTP/2 304 
date: Thu, 29 Oct 2020 02:53:39 GMT
set-cookie: __cfduid=da4fd146466a607b58d59e264103b43691603940019; expires=Sat, 28-Nov-20 02:53:39 GMT; path=/; domain=.domain.com; HttpOnly; SameSite=Lax
cf-ray: 5e999b7fced0ece2-YUL
age: 1923
cache-control: public, max-age=2592000
etag: "5f98b1b9-10da"
expires: Sat, 28 Nov 2020 02:53:39 GMT
last-modified: Tue, 27 Oct 2020 23:48:09 GMT
strict-transport-security: max-age=31536000; includeSubdomains;
vary: Accept-Encoding
cf-cache-status: HIT
cf-cache-wp: 1
cf-cachetime: 2592000
cf-request-id: 0613df83db0000ece2523d5000000001
expect-ct: max-age=604800, report-uri="https://report-uri.cloudflare.com/cdn-cgi/beacon/expect-ct"
server: cloudflare

Size: 0

invalid etag sent with 200 response but served from Cloudflare cache HIT and size = 4314 bytes

curl -skD - -H 'If-None-Match: "5f98b1b9-10db"' https://blog.domain.com/wp-content/uploads/2020/10/image.webp -w "Size: %{size_download}\n" -o /dev/null
HTTP/2 200 
date: Thu, 29 Oct 2020 02:54:29 GMT
content-type: image/webp
content-length: 4314
set-cookie: __cfduid=df5375ee9b4e34fabc71d5ca7b33eb77e1603940069; expires=Sat, 28-Nov-20 02:54:29 GMT; path=/; domain=.domain.com; HttpOnly; SameSite=Lax
cf-ray: 5e999cbc4a81ca63-YUL
accept-ranges: bytes
age: 1973
cache-control: public, max-age=2592000
etag: "5f98b1b9-10da"
expires: Sat, 28 Nov 2020 02:54:29 GMT
last-modified: Tue, 27 Oct 2020 23:48:09 GMT
strict-transport-security: max-age=31536000; includeSubdomains;
vary: Accept-Encoding
cf-cache-status: HIT
cf-cache-wp: 1
cf-cachetime: 2592000
cf-request-id: 0613e049ad0000ca6379a39000000001
expect-ct: max-age=604800, report-uri="https://report-uri.cloudflare.com/cdn-cgi/beacon/expect-ct"
server: cloudflare

Size: 4314

served via Cloudflare edge server and not origin

maybe Cloudflare doesn’t just use the etag to decide if the asset is same or not in it’s own cache? But I tried similar tests with Google CDN cache and Amazon Cloudfront CDN cache and similar thing if asset is served from cache.

1 Like

Hi!,

Thanks for your response, and much appreciated with the examples.
As you mentioned, and shown, it seems that the system does the right thing. However, I would have expected that, if the ETag was not the correct one, the system would have asked your origin for the correct response, otherwise, how the system know that the ETag is “incorrect”, “badly formatted”, “new”, or any other possibility ?

Anyways, thanks for the help :smiley:

Yeah I suspect CF has other ways to inspect if a cached resource has changed other than just relying on an etag

This topic was automatically closed after 31 days. New replies are no longer allowed.