Can't get cloudflare to cache RSS

I’m admin’ing a regional weather forecasting site running Wordpress. Most users like to access the site via an app, and that app mercilessly thrashes mysite.com/feed/ to keep the app’s content current with the site. (There are lots of reasons for the app developer having done it this way and I can’t change this.)

In order to mitigate the load on my origin server of ~50-100k uniques slamming into /feed/ on the origin server every day as updates are posted, I’m attempting to configure Cloudflare via page rule to hang onto /feed/ with a somewhat short expiry so that the screaming hoard of people hungry for their daily forecast can be serviced from CF’s edge cache.

This is what I thought would work:

…but it’s not working. Not even a little bit. Every single request for /feed/ from every single client gets returned with cf-cache-status: MISS in the response headers and gets thrown back to the origin server.

What the ■■■■ am I missing? This feels like it should be super straightforward and it’s making me feel incredibly stupid.

As you use wordpress with APO I am not an expert, but could you try this?

PageRule:
https://spacecityweather.com/*/feed/” instead of “spacecityweather.com/feed*
or even “https://spacecityweather.com/*feed/

then clear ALL Cache, specially the Cloudflare caches and try again

I am using APO, yep.

No joy: trying /*/feed/ and /*feed/ resulted in no change to the problem behavior.

Can you post your current PageRule pls?

Sure—I only have a few, since the site is bog-standard wordpress:

Ok wait … just saw the “resnok” header:
image

That indicates that your server was responding with a HTTP Status other then 200 (which stands for success)
source: LINK

Which I think is not cool as your server responds with a “304” which stands for contend did not change. Ofc the 304 should not be cached but the first 200 instead, or a new request should be cached. But like this all people just bypass Cloudflares cache. I also wonder why the first request (which cant be an 304) was not getting cached.
In a case of a 304 with Cache MISS a Cloudflare should request a fresh copy no matter what and cache it.

@yevgen can you take a look at this pls?

This I think should be escalated internaly. Please wait for Yevgen`s respond

Yeah, when I peep at the nginx & varnish logs on the server itself, I do indeed see some 304s in the mix:

2601:2c6:c000:1cd0:5c95:73fb:9b9f:7de3 - - [12/Jul/2021:10:48:21 -0400] "GET http://spacecityweather.com/wp-content/plugins/jetpack/vendor/automattic/jetpack-lazy-images/src/js/intersectionobserver-polyfill.min.js HTTP/1.1" 200 2890 "https://spacecityweather.com/" "Mozilla/5.0 (iPhone; CPU iPhone OS 14_6 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/14.1.1 Mobile/15E148 Safari/604.1"
2600:1702:af0:1a10:84ce:8647:fb6e:4316 - - [12/Jul/2021:10:48:21 -0400] "GET http://spacecityweather.com/feed HTTP/1.1" 301 0 "-" "Space City Weather app ([email protected])"
2600:1702:af0:1a10:84ce:8647:fb6e:4316 - - [12/Jul/2021:10:48:21 -0400] "GET http://spacecityweather.com/feed/ HTTP/1.1" 200 10160 "-" "Space City Weather app ([email protected])"
2600:100d:b043:afe1:e99e:e27f:8797:612f - - [12/Jul/2021:10:48:22 -0400] "GET http://spacecityweather.com/feed/ HTTP/1.1" 304 0 "-" "NetNewsWire (RSS Reader; https://netnewswire.com/)"
2601:2c7:4280:100:b4d3:f55a:aa8c:3c8e - - [12/Jul/2021:10:48:25 -0400] "GET http://spacecityweather.com/feed HTTP/1.1" 301 0 "-" "Space City Weather app ([email protected])"
2601:2c7:4280:100:b4d3:f55a:aa8c:3c8e - - [12/Jul/2021:10:48:25 -0400] "GET http://spacecityweather.com/feed/ HTTP/1.1" 200 10160 "-" "Space City Weather app ([email protected])"
2601:2c3:867f:f0c0:75cd:95db:5c65:c643 - - [12/Jul/2021:10:48:29 -0400] "GET http://spacecityweather.com/feed HTTP/1.1" 301 0 "-" "Space City Weather app ([email protected])"
2601:2c3:867f:f0c0:75cd:95db:5c65:c643 - - [12/Jul/2021:10:48:30 -0400] "GET http://spacecityweather.com/feed/ HTTP/1.1" 200 10160 "-" "Space City Weather app ([email protected])"
76.247.4.78 - - [12/Jul/2021:10:48:32 -0400] "GET http://spacecityweather.com/feed/ HTTP/1.1" 304 0 "-" "Mozilla/5.0 (compatible; Miniflux/2.0.31; +https://miniflux.app)"
157.55.39.76 - - [12/Jul/2021:10:48:33 -0400] "GET http://spacecityweather.com/storms-possible-as-spring-holds-on-a-little-bit-longer-for-houston/ HTTP/1.1" 200 13495 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
76.30.72.23 - - [12/Jul/2021:10:48:33 -0400] "GET http://spacecityweather.com/feed/ HTTP/1.1" 304 0 "-" "Space City Weather app ([email protected])"
76.30.72.23 - - [12/Jul/2021:10:48:33 -0400] "GET http://spacecityweather.com/feed/ HTTP/1.1" 304 0 "-" "Space City Weather app ([email protected])"
76.30.72.23 - - [12/Jul/2021:10:48:33 -0400] "GET http://spacecityweather.com/feed/ HTTP/1.1" 304 0 "-" "Space City Weather app ([email protected])"
73.32.238.93 - - [12/Jul/2021:10:48:33 -0400] "GET http://spacecityweather.com/feed/ HTTP/1.1" 304 0 "-" "Space City Weather app ([email protected])"
73.32.238.93 - - [12/Jul/2021:10:48:34 -0400] "GET http://spacecityweather.com/feed/ HTTP/1.1" 304 0 "-" "Space City Weather app ([email protected])"
73.32.238.93 - - [12/Jul/2021:10:48:34 -0400] "GET http://spacecityweather.com/feed/ HTTP/1.1" 304 0 "-" "Space City Weather app ([email protected])"

BTW the first PageRule will not cover any other feed then the one from your startingpage.
Check:
https://spacecityweather.com/pretty-nice-weather-for-july-comes-to-houston-this-week/feed/ is not getting matched by the rule.
Pls change the rule to:

to match ALL feeds at all.

1 Like

Yep, this is desired behavior. The app just scrapes the main feed and nothing else, so spacecityweather.com/feed/ is the single high origin traffic URL that I care about here.

Ah understand. Then you btw dont need a asterix.
You can simply set the full link like:
https://spacecityweather.com/feed/ (with https and all)

Thanks—let me try that real quick. /feed (no trailing slash) gets redirected to /feed/ (with trailing slash), so I’d originally wanted to use /feed* so that CF would also capture and cache the 301 redirect and save me 200ms or so of origin server time on doing the redirects.

Please read this:

So no benefit here unless you can redirect with a PageRule, sorry.

1 Like

So no benefit here unless you can redirect with a PageRule, sorry.

Ah, well, lesson learned for me! Thank you :slight_smile:

Edit -

Negative result on using https://spacecityweather.com/feed/ (with scheme) in the rule. I still get cf-cache-status: MISS, along with cf-apo-via: origin,nohtml now.

Edit^2 - I will add a pagerule to do the redirect to the URI with the trailing slash. That’s definitely better than making the origin server do it at this traffic volume. Thank you for the tip!!

I know, I`d not expect it to work now, like I said Yevgen must look into that.
I just try to fix the other small things I can :slight_smile:

to redirect from feed to feed/ you can set up this pagerule, then the redirect is getting done on the edge and not your server:

spacecityweather.com/*feed =(301)=> spacecityweather.com/$1feed/

Use “Forward URL” for this.

1 Like

While I have your attention, may I ask one other dumb question?

Should pagerules contain the scheme at the beginning? I don’t have a scheme in any of my existing ones—I just start with the hostname. I can’t seem to find a consistent answer to this. Some of the CF page rule examples omit the scheme, and some have it. A solid yes or no would be really helpful.

There are no dumb questions :slight_smile:

On a proper setup I would recommend it. But to pass that ball back to you I will just explaion what it does and how it behaves without scheme and you can decide what you want to go for.

The rule spacecityweather.com/ matches:
http://spacecityweather.com and https://spacecityweather.com
so it will even cache HTTP requests if they are getting answered by your origin server with a status code of “200”.

The rule https://spacecityweather.com/ matches:
https://spacecityweather.com. So just the HTTPS version. This is my prefered way of a pagerule if you want to cache static content and you know you enabled HSTS and you serve everything from HTTPS (encrypted) anyway.

Hope this helps.

2 Likes

This is exactly the explanation I needed. Thank you very much :slight_smile:

1 Like

I will rollout a fix for this later today.

3 Likes

Thank you, @yegven. What should I see on my end? Do I need to adjust my page rules, or will /feed/ simply begin showing up with cf-cache-status: HIT? Just want to make sure I’m doing everything that I’m supposed to be doing to make things work right.

correct

1 Like