Idempotency in Message Consumers
With at-least-once delivery, the same message can be processed more than once (e.g. consumer crashes after processing but before commit). Idempotency means processing the same message again produces the same outcome and does not duplicate side effects (double charge, double email). This article explains why it matters and how to implement it with message IDs and storage.
Overview
- Why: At-least-once is common (commit after process); redelivery on failure or rebalance can cause the same message to be processed again. Without idempotency you get duplicate orders, duplicate emails, or double deductions.
- Principle: Make the consumer’s effect idempotent: either the operation is naturally idempotent (e.g. set status to X), or you detect “already processed” and skip (e.g. by message id in a processed table).
- Implementation: Use a stable message identifier (e.g. Kafka message key + partition + offset, or broker message id). Before doing side effects, check if this id was already processed; if yes, skip (or re-apply same effect); if no, process and then record the id as processed in the same transaction if possible.
Example
Example 1: Natural idempotency
Java// Setting a field is idempotent: same message twice → same state UPDATE orders SET status = 'PAID', paid_at = ? WHERE id = ? AND status = 'PENDING'; // If run twice, result is the same (only first update changes rows if you use paid_at from message)
- Prefer business operations that are idempotent (e.g. set status, upsert by key) so that duplicate delivery does not change the outcome.
Example 2: Deduplication by message id
JavaString idempotencyKey = message.getKey(); // or message id from broker if (processedCache.contains(idempotencyKey)) { return; // already processed } process(message); processedCache.put(idempotencyKey, true); // or insert into DB
- Store processed message ids (in DB or cache). If the same id is seen again, skip processing. Use a bounded cache with TTL or a DB table with unique constraint on id; set TTL or retention so storage does not grow forever.
Example 3: DB-based idempotency with unique constraint
SQLCREATE TABLE processed_messages ( idempotency_key VARCHAR(255) PRIMARY KEY, processed_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP ); -- In consumer: BEGIN; INSERT INTO processed_messages (idempotency_key) VALUES (?); -- If duplicate key → skip (already processed) UPDATE orders SET status = 'PAID' WHERE id = ?; COMMIT;
- Insert the message id first; duplicate key means another instance already processed it → skip or return. Do the business update in the same transaction so you never commit the update without recording the id (or vice versa).
Core Mechanism / Behavior
- Stable key: You need a key that is the same for the same logical message across redeliveries. Use broker-provided message id, or (topic, partition, offset) in Kafka, or a business id in the payload if the producer sends it and guarantees uniqueness.
- Storage: In-memory set is simple but lost on restart; use DB or shared cache (e.g. Redis) for durability. Set retention (e.g. delete keys older than 7 days) to avoid unbounded growth.
- Order: If processing is ordered per key, “already processed” check and the business update should be in one transaction so that you do not mark as processed and then fail before updating (or update and then fail before marking).
| Approach | Pros | Cons |
|---|---|---|
| Natural idempotency (e.g. set status) | No extra state | Not all operations are naturally idempotent |
| Dedup table/cache by message id | Works for any operation | Need retention; DB/cache size |
| Idempotency key in payload (from producer) | Same key for same business request | Producer must generate and send unique key |
Key Rules
- Assume at-least-once: design every consumer so that processing the same message twice does not cause wrong or duplicate side effects.
- Use a stable, unique message identifier and store it when you process; on redelivery, skip or re-apply the same effect. Prefer storing the id and doing the side effect in one transaction.
- Prefer business-level idempotency (e.g. “set paid” instead of “add 10”) and use dedup storage when the operation is not naturally idempotent.
What's Next
See Kafka Delivery Semantics for at-least-once and exactly-once. See Retry and DLQ Strategy for retries and dead-letter. See Distributed - Idempotency Design for producer-side and API idempotency.