15 second delay from one data center

Hi, I’m looking for guidance on how to troubleshoot poor performance from one data center.

I’ve been using Cloudflare for a few months, but about six days ago I noticed the site was sometimes hanging for 15 seconds before delivering content. In investigating this, it only happens when the cf-ray indicates the SJC data center is being used.

There were two specific files that would occasionally show up with cf-cache-status: ‘STALE’ in these cases of delays. This happened very intermittently, but after a few clicks on my site I could trigger it.

After much research, I couldn’t figure out how to solve this, but I ended up increasing the max-age for these files to something very large. That greatly reduced the number of STALE instances. However, the site still occasionally hangs for 15 seconds when being accessed from SJC. Through chrome developer tools I can see that it now happens to be the main php script that is shown as hanging for 15 seconds.

After researching similar issues, I understand the issue is probably with my server, but the IP’s that appear with the delayed requests do not appear in either my access logs nor my firewall logs. In addition, I have whitelisted the offending IP’s in my firewall with no improvement in performance.

Does anyone have any suggestions on how to further troubleshoot this? To be clear, the delays do not occur every time the routing is through SJC, maybe 5-10% of the time. But it occurs 0% of the time any other data center is used. I have been pleased with Cloudflare and am kind of bummed I might have to drop it due to this one issue.

What levels of traffic are you seeing(how many requests in a time period do you see), and what is the Cache-Control header (and other Cloudflare related cache settings for this URL?)

If you run the following command, can you share the output (redact anything sensitive, and replace the variables as necessary for your site)

time curl --silent -o /dev/null --dump-headers - https://www.example.com/index.php --connect-to ::YOUR-ORIGIN-IP

It might be that your Origin is very slow to respond. If that is combined with certain cache control settings, in a popular location the content is revalidated in the background so the users do not see the slow Origin performance. But in an unpopular location the cache is too stale to be used, and the user has to wait for the cache validation to complete.

The command I gave will show the Origin performance without Cloudflare in the way.

1 Like

I see a few requests per second during peak times, but traffic hasn’t really changed in the last few months. In addition, when I disable Cloudflare, the issue goes away. So I don’t think it’s an issue with the origin being overloaded or anything like that.

I have provided the output below. But again, the unusual thing to me is that…

  • The delay only occurs when the SJC data center is involved. All other locations serve pages normally all the time.
  • The delay is intermittent (maybe every 20-30 requests)
  • When it occurs the page is served in between 15 and 16 seconds (never more or less).
  • When the delay doesn’t occur, the page is served is less than a second.
  • And when Cloudflare is paused, the delay never occurs.
HTTP/2 200 
expires: Thu, 19 Nov 1981 08:52:00 GMT
cache-control: no-store, no-cache, must-revalidate
pragma: no-cache
set-cookie: PHPSESSID=xxx; path=/
vary: Accept-Encoding,User-Agent
content-type: text/html; charset=UTF-8
date: Tue, 22 Feb 2022 00:59:54 GMT
server: Apache

real	0m0.437s
user	0m0.021s
sys	0m0.012s

This topic was automatically closed 15 days after the last reply. New replies are no longer allowed.