Worker fails if incorrectly encoded % is in the URL

Hi,

We have some google search console errors turning up for URLs that have an incorrectly encoded % in the URL. As a worker handles each request for our site, we tried to handle these errors but it is not possible if the URL contains an incorrectly encoded % in the url.

The error can be reproduced by creating the following worker:

addEventListener('fetch', event => {
  event.respondWith(handleRequest(event.request))
})

async function handleRequest(request) {
  return new Response('Website is working');	
}

And trying the following URLs in the preview:
https://www.cloudflare.com/test - works
https://www.cloudflare.com/test%20 - works
https://www.cloudflare.com/test%20test - works
https://www.cloudflare.com/test% - Throws Error 1101
https://www.cloudflare.com/test%D - Throws Error 1101

I also tried catching the error and logging it, but the editor fails to log the error so I can’t debug it more:

addEventListener('fetch', event => {
  event.respondWith(handleRequest(event.request))
})

async function handleRequest(request) {
  try {
    return new Response('Website is working (try)');	
  } catch (e) {
    console.log(e);
    return new Response('Whoops! the website is down. :( 500 Internal Server Error. '+e.name, { status: 500, statusText: 'Internal Server Error' });
  }
}

The above code shows the same Error1101

Is someone able to let me know if there is a work around for this, or what we should do next?

2 Likes

Hi @michael42,

This turned out to be a bug in the preview itself, not in Workers – if you were to deploy the script, it would have worked fine in production. I have fixed the bug in the preview, so now it should work. Let me know if you still have trouble.

4 Likes

Thanks @KentonVarda, preview works now :slight_smile:

1 Like

Hi @KentonVarda,
I’m having a similar problem to what @michael42 reported. It seems that the Worker preview throws a 1101 error if the URL has multiple percent signs in the search portion of the URL. This is using an unmodified version of default project (e.g., $ wrangler generate my-worker).

Examples of URLs that work vs. ones that throw 1101 error:

  • https://www.cloudflare.com/test?iframe=true&width=90%&height=90% - Throws error
  • https://www.cloudflare.com/test?iframe=true&width=90%&height=90 - Works
  • https://www.cloudflare.com/test?%% - Throws error
  • https://www.cloudflare.com/test?% - Works

Some web crawlers are reporting 500-level errors because Cloudflare seems to throw 1101 errors when they crawl one of our URLs and include multiple percent signs in the query params. I suppose these URLs may be mal-formed. However, should Cloudflare be throwing a 1101 error? I wonder if a 400 response would be more appropriate.

Thanks much for your help!
-Chris

2 Likes

Hi @chill,

(EDIT: This is wrong, I misunderstood chill’s report; see next comment.)

What you’re seeing is slightly different from what @michael42 was seeing.

This problem you are seeing does not happen to all workers; it depends on the code inside the worker. Specifically, it happens when a worker tries to parse a URL using new URL(request.url). It turns out our URL parser doesn’t implement the standard correctly: the parser is not supposed to try to decode %-escapes, but ours does, and when they are invalid, it throws an exception. If the worker doesn’t catch that exception, it turns into a 1101 error.

We intend to fix this, but it’s complicated to fix because workers in the wild today might be depending on the existing behavior, and we absolutely cannot push a change that breaks people’s already-deployed workers. So we need to introduce some notion of versioning in our API, which is a big project.

In the meantime, if you want to work around this problem in your worker, you can wrap your use of new URL in a try/catch, like:

let url;
try {
  url = new URL(request.url);
} catch (err) {
  return new Response(err.message,
      {status: 400, statusText: "Bad Request"});
}
3 Likes

Thanks, @KentonVarda. However, it seems the 1101 error is being thrown before my Worker’s code is executed. For example, here is my worker’s code:

addEventListener('fetch', event => {
  console.log(event);
  event.respondWith(handleRequest(event.request))
})
/**
 * Respond with hello worker text
 * @param {Request} request
 */
async function handleRequest(request) {
  return new Response('Hello worker, Chris', {
    headers: { 'content-type': 'text/plain' },
  })
}

If you run this via $ wrangler preview and use URL https://www.cloudflare.com/test?%%, this code never gets hit, it seems. I assume this because the console.log() statement never logs to the brower’s console. So it seems that Workers is choking even before my code is run.

Are you able to reproduce this issue on your end?

1 Like

Ugh, I’m sorry, you’re right. It does indeed appear that the preview service itself is rejecting the URL before it even gets to your worker. I was sure I had fixed this, but it turns out I made a classic mistake of expecting url.replace("%", "%25") to replace all instances of %, when it actually only replaces the first instance… I needed url.replace(/%/g, "%25"). :facepalm:

I’ve now deployed a further fix… it should work now.

Sorry for misreading your comment!

5 Likes

Thanks, @KentonVarda! That fixed it in my preview (I’ll need to add try/catch to a few URL constructors in my Workers code to test it on a live domain).

Thanks also for the quick turnaround!

1 Like

@KentonVarda I am not sure if I am beating a dead horse, but we have a case when third-party service is dispatching malformed (domain/something/%ORLY%) and we are trying to use workers to intercept + fix those, but it never gets to our worker code ( 400 Bad Request by Cloudflare is returned ) or Origin.

Is there a way to either get it to the workers, OR at least to the origin servers?

1 Like

That’s interesting – it seems this is happening on all Cloudflare sites whether or not they use Workers. Is this a new bug? I will poke around…

4 Likes

I’ve filed an internal issue with the appropriate team. This seems to be a bug in the core Cloudflare stack, independent of Workers.

5 Likes

Thank you very much!

1 Like

Just to confirm @KentonVarda, you will let us know here once it’s resolved/how to treat these requests, correct?

If there’s a resolution on the internal ticket I will update here.

That said, it may be worth filing a support request as well. The support team has ways of applying pressure on people to fix things that I do not.

6 Likes

So, the specific condition that seems to produce the 400 is: % in the path is followed by a non-hex-digit character in one of the next two positions. Truncated sequences (e.g. a trailing % or % followed by one digit before end-of-string) are accepted just fine. Invalid sequences in the query (rather than the path) are also accepted just fine.

It appears that this is stock nginx behavior. The exact name behavior is present on an nginx server I installed from Debian on a server that doesn’t use Cloudflare.

As far as we can tell, this has always been the case. It’s hardcoded in nginx, not something that is intended to be customized. While in theory we could go and edit the nginx parsing code, it’s not something we’re excited about doing.

Does anyone have any evidence that this specific behavior is new? If not, we may just be stuck with it.

1 Like

Yes, we are quite aware that this URL encoding is incorrect, I am just wondering what we can do to mitigate this, as neither page rules or workers are able to get the request.

Would it be possible to automatically apply the encoding fix (encode the %) like it seems https://cloudflare.com/%something% is being “corrected” (encoded to https://www.cloudflare.com/%25something%25/)

The only way to fix this is to change the nginx parser source code.

FWIW, nginx is the most popular web server on the internet, with something like 1/3 market share. So, at least a third of all the internet refuses requests like these, with no ability to customize the logic at all.

A lot of Cloudflare is built on nginx, hence why the problem affects us. Technically we could modify the parser code, but it’s a pretty risky thing to do. There might be a good reason why it’s written that way.

The “correction” you’re seeing here looks like it’s being applied by Discourse, the forum software. If you manually type the URL https://cloudflare.com/%something% into your browser, it will get a 400.

I beg to differ :smile:

To be fair if you combine “nginx” and “cloudflare server” in that report, the total is over 50%. :slight_smile:

But yes there’s a lot of ways to measure this. I was looking at: https://news.netcraft.com/archives/category/web-server-survey/

Fair point, even if I didn’t really count the proxies as unique instances, but that could be a subject of discussion :slight_smile:

Fair point too. Though, bad report, down! Bad report :smile: