Circuit Breaker, Retry & Timeout

Microservice calls need timeout to avoid blocking, retry for transient failures, and circuit breaker to fail fast when downstream stays broken and avoid cascading failure. This article explains how they work together with a reference table.

Overview

  • Timeout: Call fails if it does not complete within the limit. Prevents slow or stuck downstream from holding connections and threads. Must be set.
  • Retry: Automatically retry on failure. Use for idempotent calls and retryable errors (timeout, 5xx). Avoid for non-idempotent.
  • Circuit breaker: When error rate or slow-call ratio exceeds a threshold, the breaker opens; subsequent calls fail (or degrade) immediately without calling downstream. After a period, half-open to probe; success closes the breaker.
  • Relationship: Timeout and retry apply when the call is allowed; circuit breaker applies when the call is blocked. The breaker avoids useless retries that add load to a failing downstream.

Example

Example 1: Circuit breaker states

StateBehavior
ClosedNormal calls
OpenFail or degrade immediately; no downstream calls
Half-openAllow a few probes; success → close; failure → stay open

Example 2: Common implementations

  • Resilience4j, Sentinel, Hystrix (deprecated). Configure sliding window, failure threshold, slow-call threshold, half-open probe count, etc.

Example 3: Combined strategy

Plain text
Call → timeout 3s
     → on failure, retry 2 times (only timeout, 5xx)
     → if failure rate > 50% or slow > 80%, open circuit for 30s
     → while open, use fallback (default value, read cache)

Example 4: Resilience4j config (conceptual)

YAML
resilience4j:
  circuitbreaker:
    failureRateThreshold: 50
    waitDurationInOpenState: 30s
    slidingWindowSize: 10

Core Mechanism / Behavior

  • Timeout: Applied at connect, read, or total level. Triggers retry or failure.
  • Retry: Only for retryable errors; use backoff and jitter.
  • Circuit breaker: Counts failures in a sliding window; opens when threshold exceeded; half-open allows limited probes to test recovery.

Key Rules

  • Always set timeout; use 2–3× P99 or business-acceptable max.
  • Retry only for idempotent calls; limit count and use backoff to avoid thundering herd.
  • Circuit breaker needs sensible thresholds and half-open policy; have fallback logic and monitoring when open.

What's Next

See Timeout/Retry/Fallback, High Availability. See Idempotency for safe retry.