The implementation of URL
in workers is inconsistent with browser implementations.
Top is CF Workers, bottom is Chrome 75.
Notice that the query string is decoded but not encoded.
The implementation of URL
in workers is inconsistent with browser implementations.
Top is CF Workers, bottom is Chrome 75.
Notice that the query string is decoded but not encoded.
Additionally, URL
is missing the forEach
method, and fails to pick up query parameters
> new URL('https://google.com/?term=something%20something%20something').searchParams.forEach(console.log)
> VM170:1 Uncaught TypeError: (intermediate value).searchParams.forEach is not a function
at <anonymous>:1:84
> Object.keys(new URL('https://google.com/?term=something%20something%20something').searchParams)
> []
> Object.keys(new URL('https://google.com/?term=something+something+something').searchParams)
> []
Right - my bad about the Object.keys
. You can achieve what I’m looking for via [ ...searchParams].forEach((key, value) => ...)
The concern is that it’s inconsistent with everything else.
This particular issue comes up when using WP sites which will 301 you from URLs with +
s in the query to the encoded URL.
https://wordpress.org/?something=search+term
will auto redirect to https://wordpress.org/?something=search%20term
Consider this flow:
Hi @tyler.sebastian, you’re absolutely correct: our URL class implementation is buggy, and this is a longstanding problem. It’s unfortunately difficult to correct, because fixing it risks breaking working scripts that rely on the buggy behaviors. We need to first come up with a migration path.
In the interim, you could polyfill a conformant pure JS URL class implementation, though I realize this is not an ideal solution.
Harris
Thanks for the response, Harris. We’ve polyfilled with whatwg-url
.
The buggy behavior was breaking us when we deployed workers last year, and whatwg-url
has worked in its place (thanks @tyler.sebastian). However, I’ve noticed that our bundle is quite large due to this and I have a hypothesis that large bundle size (partially due to whatwg-url
's large size along with its tr46
dependency and necessary Buffer polyfill) is responsible for really bad cold start times we’re seeing that seem to be over 100 ms longer than a hello world worker.
@harris do you have any plans to address this yet? Even if there was something like URLFixed
in scope, that’d be helpful from a bundle optimization perspective.
I wanted to check before I spent some time experimenting with lighter alternatives to whatwg-url
. whatwg-url-without-unicode
is already a good start, but for my use case (basic URL manipulation), I think I can do a lot better.
Hi @jon22, I’m sorry to hear that. The bundle size does sound like the likely culprit for the cold start times.
As for when we’ll be able to fix the situation, I’m afraid I don’t have any update. We have a plan, but no firm timeline for it.
Harris
@harris I appreciate the info. I’ve started trying to replace whatwg-url
with a regex, and results so far are promising. I’ve cut 50% or more off cold start times.
Are there any general guidelines on what leads to longer cold starts and would these differ at all for Cloudflare vs best practices for a Node.js Functions-as-a-Service platform? Is the size of non-gzipped JS a good rough guideline? If we are bundling large strings (e.g. a few static pages), should we expect that to contribute similarly too, or is more the CPU from executing more code that slows things down?
Also curious if there are any better ways to track performance other than trial and error. I’ve just been removing code, deploying, and trying requests once every few seconds hoping to get a cold start, then measuring TTFB after connection setup. I’ve also tried adding a counter in my code as a global to indicate if the context was reused. Since Date.now()
is a no go, trial and error seems to be my best bet so far.
Hi @jon22,
I would expect the dominant factors to be complexity of the script (i.e., how long it takes to compile) and complexity of the script’s top-level execution. For example:
Technically the absolute size of the script impacts the time to fetch and decompress the script, but I wouldn’t expect this to be significant compared to the other factors.
I’d expect the above to be generally applicable to JavaScript that targets Node.js-based platforms, with the caveat that I don’t have much experience with Node.js FaaS services. Note that one could potentially use Node.js locally to guide optimization, but it’s a little tricky. Getting web platform-based JavaScript code (e.g., a Cloudflare Worker script) to run in Node.js usually involves shims/polyfills, which will impact compilation time in Node.js. Conversely, JavaScript code written for Node.js typically requires shims/polyfills to run on the web platform, which will impact compilation time in Cloudflare Workers.
Not at present, though I think there’s a clear need. If I were to measure cold starts externally right now, I’d probably do it in a manual, but scriptable way: deploy a new unique worker, wait 30 seconds, send one request to it and measure the latency (perhaps verifying via your counter idea that it was a cold start), deploy a new unique worker, wait 30 seconds, and so forth until I had a decent sample size. Hopefully I’ll have a better answer in the future.
Harris
Thanks for the pointers. It’s good to know that big strings shouldn’t matter too much as we conditionally embed some stuff into pages at the edge. I can’t wait to see what the future holds! Cloudflare Workers has already been such an amazing improvement. I never thought my first major TypeScript project would be for our CDN .
Any chance the fixed behavior could be gated behind the new “module-style” workers?
Hi @loren.brichter, that’s a good idea, and we thought about doing so, but our module syntax work had to take priority over fixing the URL class, so module syntax workers still see the current URL class implementation. I expect we’ll need to introduce a feature flag system in order to make such breaking changes.
Can you provide a working example on how to shim the URL class? I followed the steps outlined in this documentation:
All I got was “TypeError: u is not a constructor”.
I am using a TypeScript based worker.
Install the package:
npm i whatwg-url --save
Put this into the worker script at the top of the file:
import whatwgURL from "whatwg-url";
globalThis.URL = whatwgURL.URL;
Thanks I can confirm that worked like a charm.
there is one more example in the official docs, I didn’t test it though.
Why hasn’t the buggy URL been fixed by now? Or at the very least display a warning on the console if the behavior does not match correct implementations? It really doesn’t give me confidence in cloudflare if something so basic is so buggy.
+đź’Ż Should be able to opt in at least