Inexplicable caching of content-Type


#1

We have an odd issue, cloudflare appears to be caching the content-type of a specific file.

Run the this:
curl -i https://tag.benchplatform.com/asdasdasd2/get?63e1b19414787e886912d79adc7e777789e01936c4ce63a32715894db10f0e5c

Note the content type of application/octet-stream.

Now hit the underlying URL:
curl -i https://d1ia4ef2ykos03.cloudfront.net/asdasdasd2/get?63e1b19414787e886912d79adc7e777789e01936c4ce63a32715894db10f0e5c

Note the content type is application/javascript.

We’ve purged the full cache and the specific URLs, without any change.


#2

This is the weirdest thing.

When I request from my browser there’s no content-type

image

but there is when requested via curl

$ curl -i https://tag.benchplatform.com/asdasdasd2/get?63e1b19414787e886912d79adc7e777789e01936c4ce63a32715894db10f0e5c
HTTP/1.1 200 OK
Date: Tue, 12 Feb 2019 03:11:38 GMT
Content-Type: application/javascript
CF-Cache-Status: HIT
X-Cache: Miss from cloudfront

but the CF hostname version does show the correct type.


#3

I’m getting content type from Chrome Incognito and Firefox (also in privacy mode) as well as curl. But I see your x-cache were Misses from Cloudfront, but mine were Hits.


#4

I started running calls using curl to avoid the possibility that the browser was caching something or sending additional headers.

The other thing to be aware of with this is that you can retrieve the same files by hitting different URLs. So the bit between the domain and the /get is ignored, by the underlying source.

So these two URLs are essentially the same:

  1. curl -i https://d1ia4ef2ykos03.cloudfront.net/foo/get?63e1b19414787e886912d79adc7e777789e01936c4ce63a32715894db10f0e5c
  2. curl -i https://d1ia4ef2ykos03.cloudfront.net/bar/get?63e1b19414787e886912d79adc7e777789e01936c4ce63a32715894db10f0e5c

If I hit a “new” URL (that is one that is new to cloudflare). I first get a cf-cache-status: MISS header, then a cf-cache-status: HIT for the second call. In both cases the content-type is application/octet-stream.

However hitting the underlying URL always returns Content-Type: application/javascript


#5
$ curl -i https://tag.benchplatform.com/qwerty1/get?63e1b19414787e886912d79adc7e777789e01936c4ce63a32715894db10f0e5c
HTTP/1.1 200 OK
Date: Tue, 12 Feb 2019 03:50:17 GMT
Content-Type: application/javascript
CF-RAY: 4a7c1715ccc83840-ATL
CF-Cache-Status: MISS

This might be something with your local CF datacenter, or the local cloudfront (althought less likely). What is the end of your Cf Ray ID? this is the datacenter you’re hitting (ATL is Atlanta for me)


#6

Where? In Cloudflare or in CloudFront? Because you need in both - if CloudFront has it cached, when Cloudflare pulls from origin, they’ll probably do so from the same CloudFront edge cache, which may have the Content-Type value that I’m assuming you at first forgot to set for the object, and added later… so the CloudFront distribution should also be asked to remove the URL from the cache.

https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/Invalidation.html


#7

Thanks for the response, I’ve resolved the issue.

The short version is when I purged the the whole cache in cloudfront, this worked.

The longer version:
Yes, the content-type had been set to application/octet-stream, then updated later. However the change wasn’t coming through.

The problem was that cloudfront was caching this also. It’s a bit messy, but cloudfront wasn’t really being used as a CDN. It’s more being used as a tool to rewrite URLs, using [email protected]

I believed Cloudfront was configured to not cache anything, but it looks like it was caching the responses. What was confusing is that it appears it’s returning a cached response only some of the time. When I make a direct request to cloudfront, it bypasses the cache. When calling it through cloudflare, it returned the cached version.

I had believed I’d purged the cache in Cloudfront, but it looks like I hadn’t caught the item I wanted to invalidate. When I cleared the full cache, this started working.


#8

Sounds like you’re assuming that your direct requests, and Cloudflare’s request, all went to the same CloudFront edge (and thus cache). While technically possible, it is likewise also possible that the opposite is true. It’s not “cache bypass” - it’s simply not the same cache :slight_smile:


#9

True. It’s been mildly annoying.