Kafka Core Concepts

Apache Kafka is a distributed event streaming platform. Producers append messages to topics; topics are split into partitions for parallelism and ordering; consumers read in consumer groups. This article explains topics, partitions, brokers, offsets, consumer groups, and replication with examples and practical guidance.

Overview

Kafka is built around a few core abstractions:

Topic: A named stream of messages (e.g. orders, user-events). Messages are appended and retained for a configurable time or size. Topics are divided into partitions.
Partition: An ordered, immutable log. Each message in a partition has a sequential offset. Ordering is guaranteed only within a partition. Partitions enable parallelism: different partitions can be processed by different consumers.
Broker: A Kafka server. Topics are distributed across brokers; each partition has a leader and zero or more followers. Producers and consumers talk to the leader for a given partition.
Producer: Sends messages to a topic. Can specify a key for partitioning (same key → same partition) or let the broker assign partitions in round-robin.
Consumer / Consumer group: Consumers read from partitions. Within a group, each partition is consumed by at most one consumer. Adding more consumers than partitions leaves some idle until you add more partitions.

Example

Topic with partitions

Plain text
Topic: orders (3 partitions)
  partition 0: [msg0, msg3, msg6, ...]  (offset 0, 1, 2, ...)
  partition 1: [msg1, msg4, msg7, ...]
  partition 2: [msg2, msg5, msg8, ...]

Producers choose the partition by key (hash of key % number of partitions) or round-robin when no key is provided. Same key always goes to the same partition, preserving order for that key.

Producer with key (ordering per key)

Java
Properties props = new Properties();
props.put("bootstrap.servers", "localhost:9092");
props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");
props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer");

KafkaProducer<String, String> producer = new KafkaProducer<>(props);

// Same user_id always goes to same partition → order preserved per user
producer.send(new ProducerRecord<>("orders", userId, orderJson));
producer.flush();
producer.close();

Consumer group

Plain text
Group: order-processors (3 consumers, 3 partitions)
  Consumer A → partition 0
  Consumer B → partition 1
  Consumer C → partition 2

Each consumer reads from its assigned partitions. If you add a 4th consumer, it stays idle until you add more partitions. If one consumer leaves, Kafka rebalances and reassigns partitions to the remaining consumers.

Consumer code

Java
Properties props = new Properties();
props.put("bootstrap.servers", "localhost:9092");
props.put("group.id", "order-processors");
props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");

KafkaConsumer<String, String> consumer = new KafkaConsumer<>(props);
consumer.subscribe(List.of("orders"));

while (true) {
    ConsumerRecords<String, String> records = consumer.poll(Duration.ofMillis(100));
    for (ConsumerRecord<String, String> record : records) {
        process(record);
    }
    consumer.commitSync();  // or commitAsync
}

Partitions and Ordering

Ordering is per partition. If you need global order, use one partition (and one consumer). If you need order per key (e.g. per user), use a key and enough partitions so that keys are distributed.
Adding partitions increases parallelism but can change key-to-partition mapping (if you use hash). Plan partition count upfront when ordering by key matters.
Partition count is fixed at creation (or can be increased, but not decreased). Choose based on expected throughput and consumer count.

Offsets and Consumer Groups

Offset: Sequential position within a partition. Consumers track how far they have read (committed offset). On restart, they resume from the last committed offset.
Commit: commitSync() or commitAsync() tells Kafka "I have processed up to this offset." If you commit after processing (at-least-once), a crash after process but before commit can cause reprocessing. If you commit before processing (at-most-once), a crash can cause message loss.
Consumer group: All consumers with the same group.id form a group. Partitions are assigned to group members. Changing membership triggers a rebalance.

Replication and Durability

Each partition has a leader and followers. Producers write to the leader; followers replicate. When the leader fails, a follower is promoted.
acks: Producer can request different durability:
- acks=0: fire and forget
- acks=1: leader acknowledges
- acks=all: leader and all in-sync replicas acknowledge (strongest)
Retention: Messages are kept for a configured time (e.g. 7 days) or size. After that they are deleted. Consumers can re-read from any offset within retention.

Concept	Meaning
Topic	Named log stream
Partition	Ordered log shard; unit of parallelism and ordering
Offset	Position in partition; consumer commits offset
Consumer group	Set of consumers; each partition → at most one consumer
Rebalance	Reassigning partitions when group membership changes
Replication	Leader + followers for durability

Key Rules

Use message keys when you need ordering per key; same key → same partition.
Scale consumption by adding partitions and consumers. Consumers ≤ partitions for full utilization in one group.
Choose acks and retention based on durability and replay needs. See Kafka Delivery Semantics for at-least-once vs exactly-once.
Design consumers to be idempotent; at-least-once delivery can cause duplicates.

What's Next

See Kafka Delivery Semantics for at-least-once, at-most-once, exactly-once. See Ordering vs Throughput for when to favor partitioning vs batching. See Idempotency in Message Consumers for safe consumption.