Rate Limiter Design
A rate limiter caps requests per time window or concurrency to protect the system from overload. Common algorithms: fixed window, sliding window, token bucket, leaky bucket. This article explains each and implementation notes with a comparison table.
Overview
- Fixed window: Count in fixed intervals (e.g. 1 second). Simple; boundary can allow 2× traffic (100 at end of window, 100 at start of next).
- Sliding window: Count in a window that slides with current time. Smoother; slightly more complex (store multiple time slices or sliding structure).
- Token bucket: Bucket holds tokens, refilled at fixed rate; each request consumes a token; reject when empty. Allows burst when full; can also cap burst.
- Leaky bucket: Requests enter bucket; drain at fixed rate; if input exceeds drain, queue or reject. Smooth output; no burst.
- Choice: Simple counting → fixed/sliding window. Need burst → token bucket. Need strict smoothness → leaky bucket.
Example
Example 1: Algorithm comparison
| Algorithm | Pros | Cons | Best for |
|---|---|---|---|
| Fixed window | Simple | 2× at boundary | Loose limits |
| Sliding window | Smooth | More complex | Precise limiting |
| Token bucket | Allows burst | Token bookkeeping | API, QPS limit |
| Leaky bucket | Smooth output | No burst | Message queue, flow control |
Example 2: Redis implementation (token bucket)
- Key:
rate:user:123. Value: remaining tokens, last update time. Lua script: atomically refill, deduct if enough, return allow/deny.
Example 3: Layered limiting
- Global (total QPS) → per user → per API. Multiple layers protect against overload.
Example 4: Response on limit
HttpHTTP/1.1 429 Too Many Requests Retry-After: 60
- Return 429; Retry-After tells client when to retry.
Core Mechanism / Behavior
- Fixed window: Reset counter each window; simple but boundary spikes.
- Sliding window: Moving average or timestamp list; more accurate.
- Token bucket: Refill rate and capacity; consume on request; reject when empty.
- Leaky bucket: Fixed drain rate; queue or drop excess.
Key Rules
- Atomicity: Use Lua or Redis transactions for distributed rate limiting to avoid races.
- Response: Return 429 or queue; include Retry-After when appropriate.
- Monitor: Limit hit count, which users/APIs are limited; use for tuning and debugging.
What's Next
See Redis Rate Limiting, High Concurrency Toolkit. See API Gateway for centralized rate limiting.