Avalanche of Soft 404 from Google, Cloudflare (or my CF settings) the culprit?

#1

We have been running Cloudflare (free plan) on two websites that are both located on the same server for many months now. All seems to be working fine. Over the last six months or so we have seen worse and worse search results for one of the websites, while the other has remained without problems. We have suspected and investigated many different things such as possible hacks, not being mobile friendly etc. Please also note that I am a layman in this area and I apologize ahead of time for any stupidies here.

Google Search Console is reporting thousands of soft 404 and pages with errors and even blocked by robot.txt, although we have triple checked and nothing is blocked.

I am aware that there can be many different reasons for Soft 404. Over the last few days, we’ve been looking into if we may have misconfigured Cloudflare and one item in particular that in my layman’s eyes could be the culprit of the soft 404 is that when using Google Fetch and Render function to see what Google sees, the site with the problem is not displaying the whole webpage, only the Cloudflare part, although the rendered image looks just fine.

Here is a Fetch from both sites. The is identical or close to identical (haven’t checked carefully) for every page we have run fetch and render, thus if that’s what Google sees, that should be a good explanation of why they don’t accept our pages, but rule them out as Soft 404.

Working site at top:

So a couple of questions

  • Has anyone seen this behavior of Google Fetch only displaying parts of the page (It appears fine in render and the website is fully visible and browsable, so it loads fine, the problem is only with google)

  • Is there a setting in Cloudflare that we have missed or set wrong for the entire code of the end-page not to pass through to google

  • In a side-by-side comparison, I do see a difference in the Fetched HTTP response,
    the problematic website has “Cache-Control” and “Expires” which isn’t appearing on the working. Is that a setting we have missed or set incorrectly?

I did post a ticket with Cloudflare and have as of yet (24 hrs) not received a response, but I did get an automated email stating that it could be a DNS issue and that we must point it to Authoritative nameservers. We have pointed it the same way we do every website, and so I am not sure if that means authoritative, or if how it was set up with our pointing nameserver that could cause this.

Also, when comparing the settings between the two websites in Cloudflare admin area, I cannot see any differences, but I may have missed something somewhere.

Any insights or tips would be greatly appreciated.

Thanks!
PM

0 Likes

#2

UPDATE: I have run a comparison between the two websites and all settings are identical, so the problem doesn’t seem to be in a setting as far as I can see.

If anyone has a clue about this I would really appreciate any input or ideas.

Thank you!
PM

0 Likes

#3

Hi,

What exactly do you mean by “the Cloudflare part”? do you mean the http headers as shown in the second screenshot? When you say the rendered image looks just fine, do you mean it looks fine on Fetch as Google tool? or in a direct visit to your website?

You could try fetching the page in question using an online tool such as aw-snap.info/file-viewer/ using Googlebot’s user agent, and see if the page is different than when the user agent is not Googlebot’s. (In that case, you are probably in malware territory, as some malware now disguise themselves by creating different pages for Googlebot)

0 Likes

#4

Hi floripare, I really appreciate your response.

Apologies for not being clear, I mean in google fetch and render it only displays the code coming from cloudflare, not the rest of the code.

When I say rendered image looks just fine, I mean when google shows what is rendered to visitor, it is showing just fine, but there is no image of what google sees.

I followed your suggestion, rendering it with the aw-snap tool, and get the same code.

We are also redirecting http to go to https, not sure if how we have done that could case this issue.

Aw-snaps results start with:
URLs_crawled

URL HTTP Status Size

1: http://www.example.com 301 No content
2: https://www.example.com 200 13522

I have also done extensive testing uploading regular html pages and tested how they load using cloudflare, and google do fetch and render them correctly, so I don’t think this problem has anything to do with cloudflare, but instead it appears google does not want to fetch and render, period. Not sure if it’s because of page speed or old wordpress template or not being mobile optimized…

But if you or anybody has seen this before I am really interested in hearing ideas!

0 Likes

#5

I noticed there’s one difference in the HTTP header Vary that may result in many Soft 404s. While the first screenshot has:

Vary: Accept-Encoding, User-Agent

the second:

Vary: Accept-Encoding

The Vary: User-Agent is important for Google to recognize when a page is dynamically serving different content for mobile and desktop, for instance. Even if the difference is only on presentation (CSS styles, for instance). Please see: https://developers.google.com/search/mobile-sites/mobile-seo/dynamic-serving

Do you have such a setup? Is your second website generating different HTML or CSS for different users? If so, this may explain the Soft 404 issue, as Googlebot is not being told (by the Vary HTTP header) that the content may change depending on the User-Agent. If you are not using the Enterprise Plan, which allows for caching by device type, I don’t think the adding or not adding the Vary header has anything to do with Cloudflare. (Except perhaps if the AMP and/or mobile redirection settings create a Vary header, I’m not sure)

0 Likes

#6

Hi Floripare, i really appreciate you digging into this.

Yes, I noted this difference as well. We are not serving different content on the second site, so I guess the code should be that way, although not serving mobile content may be the reason google is stubbornly refusing.

I will see if I can get a a response from google webmaster central.

1 Like

closed #7

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.

0 Likes