Timeout, Retry & Fallback

RPC calls need timeout to avoid long blocking, retry to improve success rate on transient failures, and fallback to provide alternate logic when a call fails. This article covers usage, configuration, and caveats with a strategy table.

Overview

  • Timeout: If a call does not complete within the given time, it throws an exception or returns. Prevents one slow or stuck downstream from blocking callers. Typically set both connect and read timeout; read is more critical.
  • Retry: Automatically retry a failed call a limited number of times. Use for idempotent interfaces and transient network issues. Avoid or limit retries for non-idempotent writes.
  • Fallback: On failure, run alternative logic (default value, read from cache, friendly message). Can combine with circuit breaker for fast failure when downstream is unavailable.

Example

Example 1: Timeout configuration

XML
<dubbo:reference timeout="3000"/>
<!-- Or per method -->
<dubbo:method name="getUser" timeout="1000"/>
  • After timeout, a TimeoutException is thrown; the caller must handle it. Different methods can have different timeouts (DB reads fast, external calls slow).

Example 2: Retry policy

XML
<dubbo:reference retries="2"/>
<!-- Or retries="0" for no retry -->
  • Default: retry 2 times on failure (network, timeout), so up to 3 total calls. For writes and non-idempotent operations, prefer retries="0".

Example 3: Fallback example

Java
@DubboReference(mock = "return null")
UserService userService;
// Or mock = "fail:throw" for fast fail
// Or mock = "true" to use local mock implementation
  • On failure, return null or use a mock implementation instead of propagating the exception. You can also implement fallback in business code with try-catch.

Example 4: Strategy summary

MechanismRoleNotes
TimeoutPrevent long blockingSet per method; reads often longer than writes
RetryImprove success on transient failureIdempotent only; limit count and backoff
FallbackProvide fallback on failureCombine with circuit breaker; avoid cascading failure

Core Mechanism / Behavior

  • Timeout: Applied at connection, read, or total-call level. Too large: threads and connections held too long. Too small: premature failures.
  • Retry: Typically exponential backoff or fixed delay. Only retry on retryable errors (network, timeout, 5xx); not on 4xx or business errors.
  • Fallback: Can be declarative (mock) or imperative (try-catch). Should return a value the caller can handle and log/alert for visibility.

Key Rules

  • Always set timeout; missing or very large values can exhaust thread pools and connection pools.
  • Retry only for idempotent calls; for writes and money operations use retries=0 or explicit no-retry.
  • Fallback must return a business-acceptable default and log/alert so failures are visible and recoverable.

What's Next

See Circuit Breaker for how it works with timeout and retry. See Idempotency for idempotent design. See Common Production Pitfalls for common issues.