Rate Limit & Degrade in RPC

RPC needs rate limiting to prevent callers from overloading providers and degradation when providers are overloaded or failing. This article covers common implementations and strategies with a reference table.

Overview

  • Rate limit: Limit calls per unit time or concurrent calls. By caller (consumer, IP, userId) or by service/method. On excess, fail fast or queue to avoid downstream overload.
  • Degrade: When error rate, slow-call ratio, or load exceeds a threshold, reduce calls to a service/method (return default, use cache, circuit open). Protects downstream and the caller.
  • Implementation: Rate limiting can use token bucket, sliding window, or counter. It can run at gateway, provider, or consumer. Degradation often works with circuit breaker and config center for dynamic control.

Example

Example 1: Rate limit dimensions

DimensionDescription
ServiceTotal QPS limit per service
MethodQPS limit per method
CallerPer consumer, IP, or userId
ConcurrencyLimit in-flight calls (semaphore)

Example 2: Degrade strategies

  • Return default: Return null, empty list, or default config on failure.
  • Silent fail: Log and do not throw; business treats as "no data."
  • Read cache: On failure, read from local or remote cache.
  • Circuit breaker: When error rate exceeds threshold, stop calling for a period and degrade; half-open probes for recovery.

Example 3: Relation to timeout and retry

  • Rate limit: Excess is rejected; no retry (or limited retry).
  • Degrade: Once triggered, return immediately without calling downstream.
  • Timeout and retry apply when the call is allowed; rate limit and degrade apply when the call is rejected or reduced.

Example 4: Token bucket (conceptual)

Java
// Allow up to 100 requests/second; bucket refills at fixed rate
RateLimiter limiter = RateLimiter.create(100.0);
if (!limiter.tryAcquire()) {
    throw new RateLimitExceededException();
}

Core Mechanism / Behavior

  • Token bucket: Tokens added at a fixed rate; each request consumes one. Smooths bursts.
  • Sliding window: Count requests in a moving time window. More accurate than fixed window for boundary behavior.
  • Degrade: Usually triggered by error rate or latency percentiles. Once triggered, calls go to fallback for a cooldown period.

Key Rules

  • Rate limit needs clear thresholds and reject policy; prioritize core services.
  • Degrade should be configurable and graddable; add monitoring and alerts for recovery.
  • Combine rate limit and degrade with circuit breaker and load balancing for a full protection chain.

What's Next

See Circuit Breaker, Timeout/Retry/Fallback. See Rate Limiter Design (system design). See Redis Rate Limiting for Redis-based implementation.