Monitors are triggered due to slow request at least once every day

Hey everyone!

We have a site that’s hosted on Vercel. We also have a monitor set up with DataDog to check the site every five minutes and alert if the response time is greater than 1000ms. We’ve noticed that every day, at least once a day, around the same time, we get a notification alerting us that the response time exceeded that limit, and then shortly afterwards it resolves.

While it’s not such an issue just for monitoring, it does make us wonder if users are experiencing this delay and if so, why.

When the monitor is triggered, DataDog reports a TTFB upwards of 3000ms.

We initially went to Vercel, but have since figured that this is happening on the Cloudflare end. To do this we monitored both our URL and the Vercel provided URL ( e.g. xyz.vercel.app ) which doesn’t go through our Cloudflare account. The .vercel.app domain doesn’t trigger any errors, but ours does.

The site in question is https://www.thedoe.com

Here’s a screenshot of the report from DataDog

Any ideas what would be causing these intermittently slow responses?

By calling your site from an anonym tab I have seen this:

So your dynamic part of the website (which is NOT cached on Cloudflare) is slow. But all other assets are very fast (95ms at REVALIDATED) so Cloudflare itself is not the problem.
The header of your initial request shows cf-cache-status: DYNAMIC which means it will never be served from Cloudflares cache, but instead from the origin server behind.
For me it seems not to be related to Cloudflare, otherwise ALL other requests served by Cloudflare would also be slow.

It also would be nice to knwo which URL exactly you guys are monitoring, not just the domain.

What could cause the problem:
your caching on your origin server with a TTL of about 24h/1day and therefore every day at this time cache gets purged and the site is slow.

1 Like

Hmm interesting, I’m not seeing that same response time, but I guess there could be multiple reasons for that. We’re just monitoring the homepage, so https://www.thedoe.com. Interestingly though, monitoring the same site on Vercel’s domain doesn’t trigger the error.

But I guess what you’ve said here could make sense. If it’s due to Vercel’s cache, then maybe the cache expires every day, and if the monitor for thedoe.com runs before the .vercel.app one, then the .vercel.app one would always be cached, therefore never triggering the monitor.

I’ll go back to Vercel and see if I can find any info about the cache being reset there.

Thanks!

Just my assumption. As after some reloads the TTFB goes down to:

As the Cloudflare header stays the same: cf-cache-status: DYNAMIC. That means it was always just proxied and not served from Cloudflares Cache.

1 Like

Interesting. I’ll go back to Vercel. The only thing that is still confusing is that when the monitor fails, the headers include x-vercel-cache: HIT which would imply that it’s not due to the cache being invalidated or anything like that.

Even if the request is not coming from the Cloudflare cache, would there be any other way that the request could get held up at the Cloudflare end before it even hits Vercel? Not to point fingers, just want to understand that process in a bit more detail so I can better trace the issue.

For debugging please post the header cf-ray so some people here can look into. Maybe you will have to open a ticket for this. But anyway you can post the cf-ray and wait for a response from a Cloudflare employee.

For the guys at Vercel you definitely also should append the x-vercel-id if you contact them as this is their equivalent to Cloudflares cf-ray.

They then will be able to give you more info about why that particular request was served slow. A HAR file would be awesome aswell.

1 Like

Thanks for your help! The cf-ray is 6a8062925a770612-IAD. I did send the x-vercel-id along to Vercel but initial support from them didn’t clear anything up. That’s actually what lead me here. I don’t think I can download an HAR from these requests made from DataDog but if I can get my hands on one I’ll post it.

Had a quick talk with some employees. Best way is to open a ticket and post all neccessary information there. You can also link this thread so they can see the conversation :slight_smile:

Thanks! I’ll open a ticket now.

This topic was automatically closed 15 days after the last reply. New replies are no longer allowed.