Circuit Breaker, Retry & Timeout
Microservice calls need timeout to avoid blocking, retry for transient failures, and circuit breaker to fail fast when downstream stays broken and avoid cascading failure. This article explains how they work together with a reference table.
Overview
- Timeout: Call fails if it does not complete within the limit. Prevents slow or stuck downstream from holding connections and threads. Must be set.
- Retry: Automatically retry on failure. Use for idempotent calls and retryable errors (timeout, 5xx). Avoid for non-idempotent.
- Circuit breaker: When error rate or slow-call ratio exceeds a threshold, the breaker opens; subsequent calls fail (or degrade) immediately without calling downstream. After a period, half-open to probe; success closes the breaker.
- Relationship: Timeout and retry apply when the call is allowed; circuit breaker applies when the call is blocked. The breaker avoids useless retries that add load to a failing downstream.
Example
Example 1: Circuit breaker states
| State | Behavior |
|---|---|
| Closed | Normal calls |
| Open | Fail or degrade immediately; no downstream calls |
| Half-open | Allow a few probes; success → close; failure → stay open |
Example 2: Common implementations
- Resilience4j, Sentinel, Hystrix (deprecated). Configure sliding window, failure threshold, slow-call threshold, half-open probe count, etc.
Example 3: Combined strategy
Plain textCall → timeout 3s → on failure, retry 2 times (only timeout, 5xx) → if failure rate > 50% or slow > 80%, open circuit for 30s → while open, use fallback (default value, read cache)
Example 4: Resilience4j config (conceptual)
YAMLresilience4j: circuitbreaker: failureRateThreshold: 50 waitDurationInOpenState: 30s slidingWindowSize: 10
Core Mechanism / Behavior
- Timeout: Applied at connect, read, or total level. Triggers retry or failure.
- Retry: Only for retryable errors; use backoff and jitter.
- Circuit breaker: Counts failures in a sliding window; opens when threshold exceeded; half-open allows limited probes to test recovery.
Key Rules
- Always set timeout; use 2–3× P99 or business-acceptable max.
- Retry only for idempotent calls; limit count and use backoff to avoid thundering herd.
- Circuit breaker needs sensible thresholds and half-open policy; have fallback logic and monitoring when open.
What's Next
See Timeout/Retry/Fallback, High Availability. See Idempotency for safe retry.