Rate Limiter Design

A rate limiter caps requests per time window or concurrency to protect the system from overload. Common algorithms: fixed window, sliding window, token bucket, leaky bucket. This article explains each and implementation notes with a comparison table.

Overview

Fixed window: Count in fixed intervals (e.g. 1 second). Simple; boundary can allow 2× traffic (100 at end of window, 100 at start of next).
Sliding window: Count in a window that slides with current time. Smoother; slightly more complex (store multiple time slices or sliding structure).
Token bucket: Bucket holds tokens, refilled at fixed rate; each request consumes a token; reject when empty. Allows burst when full; can also cap burst.
Leaky bucket: Requests enter bucket; drain at fixed rate; if input exceeds drain, queue or reject. Smooth output; no burst.
Choice: Simple counting → fixed/sliding window. Need burst → token bucket. Need strict smoothness → leaky bucket.

Example

Example 1: Algorithm comparison

Algorithm	Pros	Cons	Best for
Fixed window	Simple	2× at boundary	Loose limits
Sliding window	Smooth	More complex	Precise limiting
Token bucket	Allows burst	Token bookkeeping	API, QPS limit
Leaky bucket	Smooth output	No burst	Message queue, flow control

Example 2: Redis implementation (token bucket)

Key: rate:user:123. Value: remaining tokens, last update time. Lua script: atomically refill, deduct if enough, return allow/deny.

Example 3: Layered limiting

Global (total QPS) → per user → per API. Multiple layers protect against overload.

Example 4: Response on limit

Http
HTTP/1.1 429 Too Many Requests
Retry-After: 60

Return 429; Retry-After tells client when to retry.

Core Mechanism / Behavior

Fixed window: Reset counter each window; simple but boundary spikes.
Sliding window: Moving average or timestamp list; more accurate.
Token bucket: Refill rate and capacity; consume on request; reject when empty.
Leaky bucket: Fixed drain rate; queue or drop excess.

Key Rules

Atomicity: Use Lua or Redis transactions for distributed rate limiting to avoid races.
Response: Return 429 or queue; include Retry-After when appropriate.
Monitor: Limit hit count, which users/APIs are limited; use for tuning and debugging.

What's Next

See Redis Rate Limiting, High Concurrency Toolkit. See API Gateway for centralized rate limiting.