[SysDes1][Chap. 4] Design a Rate Limiter

A rate limiter is used to control the rate of traffic sent by a client or a service.

Client-side rate limiter or server-side rate limiter.
Implement as middleware or in server application code. API gateway is a fully managed service that usually supports rate limiting, SSL termination, authentication, IP whitelisting, servicing static content etc.
Throttle API requests based on IP, user ID or something else
What is the scale of requests
Will the limiter work in a distributed environment
Should it be implemented as a separated service or in the application code.
Do we need to inform users who are throttled?

Option1: Token bucket algorithm (Amazon, Stripe uses it)

A token bucket is a container that has pre-defined capacity, tokens are refilled at a certain rate, each coming request will take one token, if there is not enough token, the request is dropped.
Pros: Easy to implement, Memory efficient, Allow a burst of traffic for a short period.
Cons: The parameters, bucket size and refill rate might be hard to tune.

Options2: Leaking bucket algorithm

The leaking bucket is usually implemented as a FIFO queue with a predefined capacity. Requests are processed at a fixed rate and if the queue is full the request is dropped.
Pros: Memory efficient. Server can process requests at a stable rate
Cons: A burst of traffic will fill the queue with old requests and throttle new requests. The parameters, bucket size and process rate might be hard to tune.

Option3: Fixed window counter algorithm

The algorithm dividends the timeline into a fixed-size time window and assigns a counter for each window.
Pros: Memory efficient. Easy to understand. Fit the use case where we want quota to be reset at each unit time.
Cons: Spike in traffic at the edges could cause more requests to go through.

Option4: Sliding window log algorithm

Record timestamps of the request, remove old requests outside the sliding window, and throttle based on the current request number in the sliding window.
Pros: The rate limit is controlled more accurately.
Cons: Needs to store timestamp of requests

Option5: Sliding window counter algorithm

Based on the fixed window counter algorithm, estimate the rate in the rolling window by aggregating counters number.
Pros: It smooths out spikes in traffic. Memory efficient.
Cons: The rate limiter is not strictly accurate.

Rate limit rule can be written in YAML files. Administrator can set rules and push to each rate limiter.

Option1: use lock to avoid race conditions

Option2: Use sticky session to router a client’s traffic to the same rate limiter

Option3: Centralize data storage of different rate limiter to Redis