APO response contains invalid style tag and error message from Google, breaking entire site layout

Today, after setting up the Cloudflare plugin for cache invalidation, we enabled APO again on our production sites. It had been on a while ago (not sure how long but not longer than 3 months ago) for long periods without any page experiencing issues.

However after turning it on all pages contained a 400 error message from Google at the start of the body, and an additional style tag in the head containing styles of that error page. The styles set a max width on the body, so it really messed up the layout too:

We quickly disabled APO and did a full CF purge, which resolved the issue.

The site is https://www.greenpeace.org/international/, but you currently can’t see the issue there. I’ll post a link here once we have reproduced it on another environment.

Now we wanted to set up APO for our test subdomain www-dev. We didn’t do this before enabling on production, because neither the plugin nor the Cloudflare dashboards allows administering the APO subdomains without enabling the setting first. Since APO had worked for us in the past we weren’t expecting any issues, or at least not issue that would break our site in this way.

We checked the documentation we could find about subdomains, which mentions

When activating APO on a subdomain as a part of the migration, APO will be disabled on a root domain automatically. If you are still interested in running APO against the root, please upgrade WordPress plugin version 3.8.6 or later on the root domain and re-enable APO.

That led us to assume that running the plugin from our www-dev subdomain site would allow us to turn on APO for that subdomain only. That turned out to not be the case, turning it on from there enabled APO for the root domain and www subdomain, again breaking our site and requiring a full purge to resolve.

Why doesn’t either the plugin or the Cloudflare dashboard simply allow administering the list of subdomains, including treating www no different than other subdomains? In the plugin code you can see that it sends a patch request to the zone settings (https://api.cloudflare.com/client/v4/zones/{{zoneId}}/settings/automatic_platform_optimization), which includes the hostnames.

Using the same API I was able to change these so they only have our develop and staging domains.

We haven’t tried turning APO on for that develop domain, we’ll wait for a lower traffic moment as we’re not entirely sure that turning on the option wouldn’t somehow add the root domain and www to the hostnames again.

We probably found the cause. The 400 document is where the Google Fonts stylesheet link tag was, which Cloudflare inlines as an optimization. This has a line break in the href attribute, which I expect is not playing well with the optimization code. So APO does an invalid request to Google Fonts, gets a 400 response code and content type text/html instead of the expected text/css, and still proceeds to inline that entire error page, which has its own style tag, inside our html head.

You get the same error page if you request a non-existent subset https://fonts.googleapis.com/css?family=Roboto:300,400,500,700,900|Lora:400,400i,700&display=swap&subset=IDONTEXIST

So it’s easily fixed for us now by removing the line break. But I guess this should be fixed in APO, since a line break at the end of a link’s href is ignored by browsers and works, so if this would be the case for anyone else they wouldn’t know until they turn on APO.

Additionally there could be other reasons for such a request to fail, so I think it would be prudent to at least check the response code before substituting. If 400 response gets substituted, that means there’s probably no check on response code. So 500 responses would also be included.

Good to know. @yevgen usually checks on issues like this.

I will look into this, thanks for a report.

1 Like

We will release plugin changes this week that should prevent such things from happening.

2 Likes

Thanks for the quick response @yevgen! I assume the changes you mention relate to the handling of the href attribute? Can you also shed some light on the subdomains UI limitations? The API endpoint the plugin uses to set which hostnames should use APO allows to exclude the apex domain and www subdomain before turning on APO, which is extremely useful for anyone who wants to test APO on a non-production domain first. Is there a reason why the plugin or the Cloudflare dashboard doesn’t facilitate this?

I’m referring actually to the APO fix for hostnames. I’ve added a fix for the case you had.

Can you provide an html markup that caused broken page on your site? I believe it’s fix from your side and is not available for me to test anymore.

1 Like

Ok, I see the issue with fonts optimization, we will fix it.

1 Like

That’s great, thanks!

It actually turned out that my previous assumption, that it was the line break at the end that caused this issue, is wrong. We set up a test instance where this was cleaned up and the issue persisted.

See https://www-dev.greenpeace.org/test-saturn/. We’ll try to keep the broken version on that instance for a while in case it could be helpful.

Probably the issue is caused by APO not handling combined fonts requests correctly, or at least in certain cases. On another test instance we split it into 2 links with each a single font and otherwise the same query parameters, which doesn’t have the issue. https://www-dev.greenpeace.org/test-pandora/

This seems a more serious issue, as this is a generally recommended optimization for when multiple fonts are used, so seems quite likely to affect more sites.

1 Like

It’s definetily herf with line breaks that causes the issue, try purge cache for https://www-dev.greenpeace.org/test-saturn/, it should work fine.

1 Like

I’m not excluding that a line break in the href also causes the issue, but I’m sure there’s also a problem with the combined request. I have purged the cache now just to be sure, and the issue persists. I’m sure it previously was not because CF had cached it, because even when logged into WP, which causes APO to bypass the cache (I checked the cf-cache-status header, it was DYNAMIC), it still seems to apply the fonts optimization on each response.

We rolled out the fix for this issue, please purge cache and try again.

1 Like

It doesn’t resolve the issue for https://www-dev.greenpeace.org/test-saturn/, which also already didn’t have a line break in it before your fix was rolled out.

I’m quite certain that there is a problem with requesting multiple fonts in a single link element. So I rolled out another test instance to verify which you can check on https://www-dev.greenpeace.org/test-pluto/. It has these changes. What you see in the diff is the exact markup our origin is serving. I included some empty script tags with an id as a marker around, so you just have to view the page source and see what is between before-combined-sheet and after-combined-sheet.

You can clearly see that this link tag is still being replaced with a 400 error message, while the url is in fact a valid one: https://fonts.googleapis.com/css?family=Roboto:300,400,500,700,900|Lora:400,400i,700&display=swap&subset=latin-ext

purging the cache should fix it, we would never embed error pages into markup anymore.

1 Like

In addition to CF cache you need to purge your server cache, as it cached broken pages somehow:
x-cache: HIT.

1 Like

I purged the cache and it’s still embedding the error page, which you can see if you check the age header https://www-dev.greenpeace.org/test-pluto/?qs. The query string forces a bypass of our cache (any query string will), so you can exclude that as a cause.

That error page cannot possibly get into our cache, which is an nginx in front of our WP server. We just serve the page with a link tag, it’s only after Cloudflare has requested our html that the link tag is substituted with a style tag containing the response. We have nothing in place that would perform that optimization, nor do we somehow update that cache with APO optimization result.

I set up a simpler page which has the following html, to make it extra easy to verify that APO is not handling requests for multiple fonts well. https://www-dev.greenpeace.org/test-pluto/test-apo.html

<!DOCTYPE html>
<html>
  <head>
    <script id="before-combined-sheet"></script>
    <link rel="stylesheet" href="https://fonts.googleapis.com/css?family=Roboto:300,400,500,700,900|Lora:400,400i,700&display=swap&subset=latin-ext"/>
    <script id="after-combined-sheet"></script>
  </head>
<body><p>body</p></body>
</html>

Can you verify that APO is still embedding the error page in the style tag there?

thanks, will have a look.