Discussion basis for an affordable and scalable DDOS defense for Cloudflare Workers

Goal:

  • We would like to limit the request rate to our worker (to protect us from unexpected costs)
  • We do not want to use the “Rate Limiting” geature, since it increases the costs for “good” reqeuest from 0,50$/m to 5$/m (10:1)
  • The solution should work with the “Pro” ($20) plan.

Basic idea:

  • The worker keeps 3 request counters per source IP. One per time period (most current 10second period, 1minute, 15minute).
  • For every time period there is certain threshold value. If that threshold is exceeded, the source IP is blocked by writing it into an “IP Access Rule” (by an API).

Where to store the counters?

  • The lowest price seems to be a durable object but without using its storage api. Just by storing the counters in the Durable Object itself as kind of in-memory data structure.
  • This increases the costs per request only insignificantly from 0,50$/m to approx. 0,70$/m (0,15$/m for requests and in addition to that some amount for compute).

There are 2 problems:

  • The durable object and thereby its counters can potentially be evicted from memory after 30 seconds at the earliest. This means that counters for time periods longer than 30 seconds cannot be kept reliably.
  • Since “IP Access Rules” have a limit of 50,000 rules per account a attack from more than 50.000 different IP addresses cannot be handled.

Solution:

  • Having a single Durable Object for a whole /16 Cidr block of IP addresses. (and storing potentially 65000 x 3 counters there)

Advantages of this solution:

  • Since the rate of access to a single object is increased, the object will remain longer in-memory and therefore counters of longer periods can also be kept.
  • Since an object has visibility on an entire /16 network and thus also on all contained smaller cidr blocks, additional statistics can be kept for these cidr blocks. For example 3 counters for each of the contained 256 /24 networks
  • This makes it possbile to also block a whole /24 network with a single IP Access Rule and thus that the limitation of 50,000 entries becomes a problem.

Possible disadvantages (and a possbile solution for that):

  • All requests to the worker from up to 65k (/16 CIDR) different source ip addresses are processed only after they have been previously checked via a single instance of that durable object which can become a bottle neck.
  • This could be avoided if the worker first serves the request and does all the DDOS work afterwards. By using the waitUntil() method the lifetime of the worker can be extended until those check (and possbile blocking the IP in future) are done.

Per-POP workers Cache API is “free”. Just create a cache URL for each IP address. 1 abusive client/user agent CAN NOT connect to more than 1 POP per min/hour/ever.

What is your actual threat? Why do you think you will be targeted? What is your business model? Tried paying a “stressor” company on your not-public but orange clouded dev URL yet? Considered WAF rule blocking all countries/all ASNs that you DO NOT ACCEPT CREDIT CARDS FROM/SHIP TO?

CF customer service supposedly offers refunds for attack traffic if your dashboard shows a hockey stick of requests made it through through CF’s WAF.