Rate Limiting with Redis (Token Bucket)

Rate limiting restricts how many requests a client can make in a time window. The token bucket model: a bucket holds tokens, refilled at a fixed rate; each request consumes one (or N) tokens; if no tokens are left, the request is rejected. This article shows how to implement a token-bucket (or a close variant) in Redis with Lua for atomicity and gives a simple table of alternatives.

Overview

Token bucket: Bucket has a capacity C and refill rate R (tokens per second). Each request takes 1 token; if tokens ≥ 1, allow and decrement; otherwise deny. Refill: current_tokens = min(C, last_tokens + (now - last_time) * R).
Redis role: Store per-client state (tokens left, last update time) in a key (e.g. rate:user:1001). Use Lua to compute new token count and allow/deny in one atomic step.
Sliding window: Alternative is a sliding window (e.g. count of requests in last N seconds using ZSet or sorted list). Simpler but different semantics (fixed window can allow 2× at boundary; sliding is smoother but more state).

Example

Example 1: Token bucket in Lua (simplified)

Stored value: tokens,last_ts (e.g. "5,1707123456.123").
Logic in Lua:
- Parse tokens and last_ts.
- Refill: now - last_ts seconds passed → add (now - last_ts) * rate tokens, cap at capacity.
- If tokens ≥ 1: allow, set tokens = tokens - 1, last_ts = now, return 1. Else return 0.
Set key TTL to e.g. capacity/rate seconds so unused buckets expire.

Example 2: Fixed window counter (simpler, not token bucket)

Redis
INCR rate:user:1001
EXPIRE rate:user:1001 60
-- If INCR result > 100, reject. Else allow.

Key per (client, window). Window = 60 s; limit 100. Problem: at window boundary two windows can each allow 100 (burst of 200). Token bucket or sliding window avoids that.

Example 3: Sliding window with ZSet

Redis
ZADD rate:user:1001 now request_id
ZREMRANGEBYSCORE rate:user:1001 0 now-60
ZCARD rate:user:1001
EXPIRE rate:user:1001 61
-- If ZCARD <= 100, allow; else reject. request_id = unique per request (e.g. UUID).

Score = timestamp; remove entries older than 60 s; count = requests in last 60 s. Gives a true sliding window; more keys and operations than a simple counter.

Core Mechanism / Behavior

Atomicity: Use a single Lua script so refill, check, and consume (and optional TTL) happen in one call. Otherwise two requests can both see the same token count and both be allowed.
Key design: One key per (client, limit scope), e.g. rate:ip:1.2.3.4 or rate:user:1001. TTL the key so inactive clients don’t keep state forever.
Refill: Token bucket refill is continuous (time-based). So burst is bounded by capacity; sustained rate is bounded by refill rate. Good for APIs that want both burst and steady limit.

Method	Pros	Cons
Token bucket (Lua)	Smooth rate, burst control	More state and math
Fixed window counter	Simple, few ops	Burst at window boundary
Sliding window (ZSet)	Accurate sliding window	More memory and ZREMRANGEBYSCORE

Key Rules

Implement allow/deny and state update in one Lua script so there is no race between concurrent requests.
Set a TTL on the rate-limit key so keys for inactive clients expire; choose TTL ≥ window length (e.g. 2× refill period for token bucket).
Choose token bucket when you want burst + sustained limit; use sliding window when you need strict “last N seconds” semantics. Fixed window is acceptable when double burst at boundary is acceptable.

What's Next

See Distributed Lock with Redis for Lua and key design. See Data Types for ZSet and String. See Rate Limiter Design (system design) for high-level design and multi-dc considerations.