Issues with Worker to Worker communication

Hey everyone,

We seem to be experiencing issues with worker to worker communication(sometimes with service bindings, other times not).

To summarise our current setup:

  • We have a sizeable amount of workers 20-30
  • Each worker has service bindings
  • Some workers(~3) are deployed manually to multiple environments that are set in the wrangler.toml file in each of their repositories. All other workers just have one environment.
  • All workers have a .workers.dev domain
  • Some workers(around 50%) have a seperate domain (ending in .cc)

An outline of the functionality:

  • We have a main worker(Worker 0), which interfaces with a number of other workers to perform some operations on our clients data on various systems.
  • When a worker makes a request to another, it sends along it’s auth in a header. This auth header is then used by the recipient to decide whether the request is valid, and then associate the sending worker with a particular set of data.

The issue arises when there are multiple workers communicating with each other(not all at the same time). I will provide a short example below:

We have 3 workers:
Worker 0:

  • .workers.dev and .cc domain
  • Service binding set for Worker 1 and 2

Worker 1

  • .workers.dev and .cc domain
  • Service binding set for Worker 2

Worker 2

  • .workers.dev and .cc domain

They can communicate with each other in the following way:
Worker 0 → Worker 1
Worker 0 → Worker 2
Worker 1 → Worker 2

With each of these worker to worker requests, there are some indiscrepancies.

Worker 0 → 2 uses the service binding and the .cc domain
Worker 1 → 2 has issues with using the service binding, and needs to use the .workers.dev domain

There does not appear to be any pattern to these discrepancies and when compounded against the amount of workers and environments we are working with, it is very difficult to debug.

When testing the workers individually, and providing the expected auth headers, we are able to get the correct responses. When testing workers in tandem, we can log out the auth headers to verify that the correct ones are being sent. In this instance, we will get to a point where were receive a 401 response, due to certain environment variables not being found.

Is there a possibility that the environment variables that are set in the wrangler file are not applied to certain environments one certain domains or service bindings?

Thankyou for any guidance or support that you can offer. We appreciate it immensely.

Josh.

PS: If you need any additional information, please don’t hesitate to let me know