URL Shortener Design
A URL shortener maps long URLs to short codes (e.g. bit.ly/abc123). Visiting the short URL redirects to the original. Core pieces: short code generation, storage, redirect. This article explains design points with a reference table.
Overview
- Short code: Auto-increment ID to base-62 (0-9a-zA-Z), random string, or hash + collision handling. Must be short (6–8 chars), unique, scalable.
- Storage: Mapping short code → long URL; K-V store (Redis, DB). Can add TTL, click stats, creator, etc.
- Redirect: GET short URL → lookup mapping → 302 redirect to long URL. Use 301 (permanent) or 302 (allows stats, target change).
- High concurrency: Read-heavy; cache hot links (Redis); DB for durability and backup.
Example
Example 1: Flow
Plain textCreate: long URL → generate short code → store short→long → return short URL Visit: short URL → lookup short→long → 302 redirect
Example 2: Short code generation
| Method | Pros | Cons |
|---|---|---|
| Auto-increment + base-62 | No collision, ordered | Needs distributed ID |
| Random | Simple | Collision needs retry |
| Hash | Same URL → same code | Collision, predictable |
Example 3: Storage and cache
- DB: short code (PK), long URL, created_at, TTL, stats. Redis: short code → long URL; TTL aligned with DB; on miss, load from DB and populate cache.
Example 4: Scale considerations
- Distributed ID (Snowflake, segment) for unique codes; collision retry or different algorithm.
- CDN for short URL endpoints if traffic is high.
Core Mechanism / Behavior
- Base-62: Encode numeric ID with 0-9, a-z, A-Z for compact representation.
- Redirect: 301 cached by browser; 302 allows per-request handling and stats.
- Duplicate URLs: Same long URL can map to same short code (hash) or different (random/increment); choose by product needs.
Key Rules
- Short codes must be unique; use Snowflake, segments, etc. for distributed generation; retry or change algo on collision.
- 302 allows stats and target changes; 301 is cached by browser and harder to track.
- Abuse control: Rate limit by IP, user; filter malicious or sensitive URLs.
What's Next
See Pagination (for lists), Cache Strategy, Rate Limiting. See Distributed ID for code generation.