Kafka Core Concepts
Apache Kafka is a distributed event streaming platform. Producers append messages to topics; topics are split into partitions for parallelism and ordering; consumers read in consumer groups. This article explains topics, partitions, brokers, offsets, consumer groups, and replication with examples and practical guidance.
Overview
Kafka is built around a few core abstractions:
- Topic: A named stream of messages (e.g.
orders,user-events). Messages are appended and retained for a configurable time or size. Topics are divided into partitions. - Partition: An ordered, immutable log. Each message in a partition has a sequential offset. Ordering is guaranteed only within a partition. Partitions enable parallelism: different partitions can be processed by different consumers.
- Broker: A Kafka server. Topics are distributed across brokers; each partition has a leader and zero or more followers. Producers and consumers talk to the leader for a given partition.
- Producer: Sends messages to a topic. Can specify a key for partitioning (same key → same partition) or let the broker assign partitions in round-robin.
- Consumer / Consumer group: Consumers read from partitions. Within a group, each partition is consumed by at most one consumer. Adding more consumers than partitions leaves some idle until you add more partitions.
Example
Topic with partitions
Plain textTopic: orders (3 partitions) partition 0: [msg0, msg3, msg6, ...] (offset 0, 1, 2, ...) partition 1: [msg1, msg4, msg7, ...] partition 2: [msg2, msg5, msg8, ...]
Producers choose the partition by key (hash of key % number of partitions) or round-robin when no key is provided. Same key always goes to the same partition, preserving order for that key.
Producer with key (ordering per key)
JavaProperties props = new Properties(); props.put("bootstrap.servers", "localhost:9092"); props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer"); props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer"); KafkaProducer<String, String> producer = new KafkaProducer<>(props); // Same user_id always goes to same partition → order preserved per user producer.send(new ProducerRecord<>("orders", userId, orderJson)); producer.flush(); producer.close();
Consumer group
Plain textGroup: order-processors (3 consumers, 3 partitions) Consumer A → partition 0 Consumer B → partition 1 Consumer C → partition 2
Each consumer reads from its assigned partitions. If you add a 4th consumer, it stays idle until you add more partitions. If one consumer leaves, Kafka rebalances and reassigns partitions to the remaining consumers.
Consumer code
JavaProperties props = new Properties(); props.put("bootstrap.servers", "localhost:9092"); props.put("group.id", "order-processors"); props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer"); props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer"); KafkaConsumer<String, String> consumer = new KafkaConsumer<>(props); consumer.subscribe(List.of("orders")); while (true) { ConsumerRecords<String, String> records = consumer.poll(Duration.ofMillis(100)); for (ConsumerRecord<String, String> record : records) { process(record); } consumer.commitSync(); // or commitAsync }
Partitions and Ordering
- Ordering is per partition. If you need global order, use one partition (and one consumer). If you need order per key (e.g. per user), use a key and enough partitions so that keys are distributed.
- Adding partitions increases parallelism but can change key-to-partition mapping (if you use hash). Plan partition count upfront when ordering by key matters.
- Partition count is fixed at creation (or can be increased, but not decreased). Choose based on expected throughput and consumer count.
Offsets and Consumer Groups
- Offset: Sequential position within a partition. Consumers track how far they have read (committed offset). On restart, they resume from the last committed offset.
- Commit:
commitSync()orcommitAsync()tells Kafka "I have processed up to this offset." If you commit after processing (at-least-once), a crash after process but before commit can cause reprocessing. If you commit before processing (at-most-once), a crash can cause message loss. - Consumer group: All consumers with the same
group.idform a group. Partitions are assigned to group members. Changing membership triggers a rebalance.
Replication and Durability
- Each partition has a leader and followers. Producers write to the leader; followers replicate. When the leader fails, a follower is promoted.
- acks: Producer can request different durability:
acks=0: fire and forgetacks=1: leader acknowledgesacks=all: leader and all in-sync replicas acknowledge (strongest)
- Retention: Messages are kept for a configured time (e.g. 7 days) or size. After that they are deleted. Consumers can re-read from any offset within retention.
| Concept | Meaning |
|---|---|
| Topic | Named log stream |
| Partition | Ordered log shard; unit of parallelism and ordering |
| Offset | Position in partition; consumer commits offset |
| Consumer group | Set of consumers; each partition → at most one consumer |
| Rebalance | Reassigning partitions when group membership changes |
| Replication | Leader + followers for durability |
Key Rules
- Use message keys when you need ordering per key; same key → same partition.
- Scale consumption by adding partitions and consumers. Consumers ≤ partitions for full utilization in one group.
- Choose
acksand retention based on durability and replay needs. See Kafka Delivery Semantics for at-least-once vs exactly-once. - Design consumers to be idempotent; at-least-once delivery can cause duplicates.
What's Next
See Kafka Delivery Semantics for at-least-once, at-most-once, exactly-once. See Ordering vs Throughput for when to favor partitioning vs batching. See Idempotency in Message Consumers for safe consumption.