Unexpected random 404 errors

What is the name of the domain?

thatch.ai

What is the error number?

404

What is the error message?

404 page not found

What is the issue you’re encountering

When Cloudflare proxying is enabled, some requests will occasionally fail with a 404 error.

What steps have you taken to resolve the issue?

Starting last week (Dec. 10th), we started observing some requests randomly failing with a 404 error.

These requests are legitimate and should not have failed. Additionally:

  • no trace of these requests can be found in our application logs
  • the response payload (a simple “404 page not found” plain text string) does not match our app’s 404 handler
  • we also observed the same issue on a different service, a 3rd-party app that we’re running as is (Metabase)

We’ve reached out to our hosting provider, Render.com. Render thinks the issue is not on their end, because the response headers for the 404 responses does not include the x-render-routing header that they inject. However, they’ve let us know that they are also using Cloudflare and suggested this could be an O2O issue.

They suggested that we disable Cloudflare proxying, and as far as we can tell that does seem to solve the issue for us. However this also removes our ability to monitor the issue (since the only place where we can see the 404 errors is in Cloudflare logs) and is not a viable long term solution for us.

I opened a ticket with Cloudflare last Friday (request # 01312293) but have not heard anything back. I am also unable to check the ticket’s status, possible due to this incident: Cloudflare Status - Support Ticket Migration - Case Access Issues. I have tried using chat support but it didn’t work. I’ve reached out over Cloudflare’s Discord server, where I was advised to post here.

Here are the full response headers for a request that returned an unexpected 404:

    'cf-cache-status': 'DYNAMIC',
    'cf-ray': '8f39f65b1e8fba36-SEA',
    connection: 'keep-alive',
    'content-length': '19',
    'content-type': 'text/plain; charset=utf-8',
    date: 'Tue, 17 Dec 2024 21:21:21 GMT',
    nel: '{"success_fraction":0,"report_to":"cf-nel","max_age":604800}',
    'report-to': '{"endpoints":[{"url":"https:\\/\\/a.nel.cloudflare.com\\/report\\/v4?s=MaDL%2F9iuIyj0INKF3liSVTfy1y0ArgOTi%2FfvcMOPzG%2BP3E1sROy2sqQDlzM2SgaKmDDBW3PIi19Ar8TbhHagZYREMKcOLXi8DYhIFdFTu2FSE%2Fe0Y2eDHJfH%2BzuFmqhp7AX0OOImq5Ga%2Fw%3D%3D"}],"group":"cf-nel","max_age":604800}',
    server: 'cloudflare',
    'server-timing': 'cfL4;desc="?proto=TCP&rtt=7553&min_rtt=7550&rtt_var=2837&sent=5&recv=5&lost=0&retrans=0&sent_bytes=2849&recv_bytes=798&delivery_rate=557697&cwnd=128&unsent_bytes=0&cid=02423b005368cedf&ts=85&x=0"',
    vary: 'Accept-Encoding',
    'x-content-type-options': 'nosniff'

I’ve managed to reproduce the issue using Cloudflare’s tracing tool, and it looks like nothing in our Cloudflare setup is causing the issue.

I can also provide Cloudflare Ray IDs, or any other information.

What feature, service or problem is this related to?

DNS records

What are the steps to reproduce the issue?

The issue happens randomly when Cloudflare proxying is enabled. As far as we can tell, it can happen on any type of request (normal GET from a browser, AJAX requests, API requests sent from a remote server, etc.).

partners.thatchcloud.com is such a domain. I’ve reproduced the issue with simple GET / requests to that domain.

Is the domain on a free or paid plan?

The issue happens on 2 different domains, both on paid plans (one Business, one Pro).

A Cloudflare technical support engineer just replied to my ticket, so hopefully we’ll be able to figure out what’s happening together. I’ll post an update here when we do.

1 Like

Olivier1, have they found any fix for this issue?
I’m experiencing the same issue with two websites also hosted on Render and proxy with Cloudflare. The 404s are random and seem to mostly affect our clients on the west coast.
We’ve also submitted a ticket with no response yet.

Please keep me posted if you successfully fix the issue!
Thanks,

Hi Jacob,

Unfortunately no, I do not have an update. I received exactly one reply to my ticket, 5 days after opening it. It has now been 10 days since opening the ticket without any other communication from Cloudflare. The issue is still happening and Cloudflare seems completely uninterested in investigating it, much less fixing it. The only “fix” is to disable proxying for the affected hosts (i.e. not use Cloudflare).

I’ll be sure to post an update if we ever get to the bottom of this, but I wouldn’t hold my breath.

That’s unfortunate, but is reflective of our experience with Cloudflare support so far.
I’ll let you know on my end if we ever make any progress.
I doubt it’s a coincidence we both have the same issues and both use Cloudflare proxying with Render.

Can you check the instant logs on the domain with the business plan and produce the error live?

Probably not a coincidence. But The 404 looks like it is coming from Render, so they would need to get involved with the Cloudflare support for fixing this. An unbranded 404 is very likely not coming from Cloudflare.

It might be that Render is serving the wrong response because of some O2O issues, but in that case, they would need to work with Cloudflare to find the cause.

How often does it happen? Is it ok if I try to reproduce with a few hundred requests?

Can you check the instant logs on the domain with the business plan and produce the error live?

We did manage to reproduce the issue and observe it in instant logs a couple of times last week. I just tried again but haven’t been able to repro today.

How often does it happen?

It’s very inconsistent. Sometimes we can reproduce the issue in minutes, sometimes it doesn’t seem to happen at all. It might also be dependent on the location. Last week I (in Seattle) was having trouble reproducing the issue but one of my colleagues (in LA) would easily be able to.

Coincidentally I am in LA right now but can’t reproduce right now… As I said, it’s very inconsistent, which makes this issue all the more annoying.

Is it ok if I try to reproduce with a few hundred requests?

Sure. Render’s Cloudflare might rate limit you, and the application server also has its own rate limiter, but if you run into rate limits you should see 429s rather than 404s.

Hey olivier1, I work on the same team as Jacob above, and have been trying to sort this out for a few weeks now. No matter what information we present, Cloudflare is consistent in their (limited) responses, which basically can be summarized as “we don’t generate 404s, the issue must be with the host.” Even when sharing this thread with them in the support ticket, they didn’t seem interested in specifics or any of the patterns (locations, infrequency, etc) that suggest a CDN-based issue. Given their consistency on the matter, I am re-engaging Render support and suggesting some exchange of information between Cloudflare and Render is likely necessary to make any progress with this. Even if we are in the middle relaying it.

Mainly posting this to keep this thread alive as it will auto-close in another week without a reply. Will try to reply here if we make any meaningful progress and hope you do the same. Thanks.

Hi all,

The issue is still happening, but Cloudflare has finally replied to us earlier this week and started investigating.

My current theory is that some requests are randomly hitting the wrong origin which would explain the 404 errors if that origin doesn’t host the content being requested.

Since Cloudflare doesn’t throw 404 errors ourselves, I think the issue is likely something to do with the routing of the request, rather than the request itself.

and

Our team has enabled tracing on the zone to see if we can capture more details about the worker these requests are running through. This should help us determine what might be happening from our perspective.

I am cautiously optimistic that this issue will eventually get resolved :crossed_fingers:

2 Likes