Rate Limiter Design

A rate limiter caps requests per time window or concurrency to protect the system from overload. Common algorithms: fixed window, sliding window, token bucket, leaky bucket. This article explains each and implementation notes with a comparison table.

Overview

  • Fixed window: Count in fixed intervals (e.g. 1 second). Simple; boundary can allow 2× traffic (100 at end of window, 100 at start of next).
  • Sliding window: Count in a window that slides with current time. Smoother; slightly more complex (store multiple time slices or sliding structure).
  • Token bucket: Bucket holds tokens, refilled at fixed rate; each request consumes a token; reject when empty. Allows burst when full; can also cap burst.
  • Leaky bucket: Requests enter bucket; drain at fixed rate; if input exceeds drain, queue or reject. Smooth output; no burst.
  • Choice: Simple counting → fixed/sliding window. Need burst → token bucket. Need strict smoothness → leaky bucket.

Example

Example 1: Algorithm comparison

AlgorithmProsConsBest for
Fixed windowSimple2× at boundaryLoose limits
Sliding windowSmoothMore complexPrecise limiting
Token bucketAllows burstToken bookkeepingAPI, QPS limit
Leaky bucketSmooth outputNo burstMessage queue, flow control

Example 2: Redis implementation (token bucket)

  • Key: rate:user:123. Value: remaining tokens, last update time. Lua script: atomically refill, deduct if enough, return allow/deny.

Example 3: Layered limiting

  • Global (total QPS) → per user → per API. Multiple layers protect against overload.

Example 4: Response on limit

Http
HTTP/1.1 429 Too Many Requests
Retry-After: 60
  • Return 429; Retry-After tells client when to retry.

Core Mechanism / Behavior

  • Fixed window: Reset counter each window; simple but boundary spikes.
  • Sliding window: Moving average or timestamp list; more accurate.
  • Token bucket: Refill rate and capacity; consume on request; reject when empty.
  • Leaky bucket: Fixed drain rate; queue or drop excess.

Key Rules

  • Atomicity: Use Lua or Redis transactions for distributed rate limiting to avoid races.
  • Response: Return 429 or queue; include Retry-After when appropriate.
  • Monitor: Limit hit count, which users/APIs are limited; use for tuning and debugging.

What's Next

See Redis Rate Limiting, High Concurrency Toolkit. See API Gateway for centralized rate limiting.