Kafka Core Concepts

Apache Kafka is a distributed event streaming platform. Producers append messages to topics; topics are split into partitions for parallelism and ordering; consumers read in consumer groups. This article explains topics, partitions, brokers, offsets, consumer groups, and replication with examples and practical guidance.

Overview

Kafka is built around a few core abstractions:

  • Topic: A named stream of messages (e.g. orders, user-events). Messages are appended and retained for a configurable time or size. Topics are divided into partitions.
  • Partition: An ordered, immutable log. Each message in a partition has a sequential offset. Ordering is guaranteed only within a partition. Partitions enable parallelism: different partitions can be processed by different consumers.
  • Broker: A Kafka server. Topics are distributed across brokers; each partition has a leader and zero or more followers. Producers and consumers talk to the leader for a given partition.
  • Producer: Sends messages to a topic. Can specify a key for partitioning (same key → same partition) or let the broker assign partitions in round-robin.
  • Consumer / Consumer group: Consumers read from partitions. Within a group, each partition is consumed by at most one consumer. Adding more consumers than partitions leaves some idle until you add more partitions.

Example

Topic with partitions

Plain text
Topic: orders (3 partitions)
  partition 0: [msg0, msg3, msg6, ...]  (offset 0, 1, 2, ...)
  partition 1: [msg1, msg4, msg7, ...]
  partition 2: [msg2, msg5, msg8, ...]

Producers choose the partition by key (hash of key % number of partitions) or round-robin when no key is provided. Same key always goes to the same partition, preserving order for that key.

Producer with key (ordering per key)

Java
Properties props = new Properties();
props.put("bootstrap.servers", "localhost:9092");
props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");
props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer");

KafkaProducer<String, String> producer = new KafkaProducer<>(props);

// Same user_id always goes to same partition → order preserved per user
producer.send(new ProducerRecord<>("orders", userId, orderJson));
producer.flush();
producer.close();

Consumer group

Plain text
Group: order-processors (3 consumers, 3 partitions)
  Consumer A → partition 0
  Consumer B → partition 1
  Consumer C → partition 2

Each consumer reads from its assigned partitions. If you add a 4th consumer, it stays idle until you add more partitions. If one consumer leaves, Kafka rebalances and reassigns partitions to the remaining consumers.

Consumer code

Java
Properties props = new Properties();
props.put("bootstrap.servers", "localhost:9092");
props.put("group.id", "order-processors");
props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");

KafkaConsumer<String, String> consumer = new KafkaConsumer<>(props);
consumer.subscribe(List.of("orders"));

while (true) {
    ConsumerRecords<String, String> records = consumer.poll(Duration.ofMillis(100));
    for (ConsumerRecord<String, String> record : records) {
        process(record);
    }
    consumer.commitSync();  // or commitAsync
}

Partitions and Ordering

  • Ordering is per partition. If you need global order, use one partition (and one consumer). If you need order per key (e.g. per user), use a key and enough partitions so that keys are distributed.
  • Adding partitions increases parallelism but can change key-to-partition mapping (if you use hash). Plan partition count upfront when ordering by key matters.
  • Partition count is fixed at creation (or can be increased, but not decreased). Choose based on expected throughput and consumer count.

Offsets and Consumer Groups

  • Offset: Sequential position within a partition. Consumers track how far they have read (committed offset). On restart, they resume from the last committed offset.
  • Commit: commitSync() or commitAsync() tells Kafka "I have processed up to this offset." If you commit after processing (at-least-once), a crash after process but before commit can cause reprocessing. If you commit before processing (at-most-once), a crash can cause message loss.
  • Consumer group: All consumers with the same group.id form a group. Partitions are assigned to group members. Changing membership triggers a rebalance.

Replication and Durability

  • Each partition has a leader and followers. Producers write to the leader; followers replicate. When the leader fails, a follower is promoted.
  • acks: Producer can request different durability:
    • acks=0: fire and forget
    • acks=1: leader acknowledges
    • acks=all: leader and all in-sync replicas acknowledge (strongest)
  • Retention: Messages are kept for a configured time (e.g. 7 days) or size. After that they are deleted. Consumers can re-read from any offset within retention.
ConceptMeaning
TopicNamed log stream
PartitionOrdered log shard; unit of parallelism and ordering
OffsetPosition in partition; consumer commits offset
Consumer groupSet of consumers; each partition → at most one consumer
RebalanceReassigning partitions when group membership changes
ReplicationLeader + followers for durability

Key Rules

  • Use message keys when you need ordering per key; same key → same partition.
  • Scale consumption by adding partitions and consumers. Consumers ≤ partitions for full utilization in one group.
  • Choose acks and retention based on durability and replay needs. See Kafka Delivery Semantics for at-least-once vs exactly-once.
  • Design consumers to be idempotent; at-least-once delivery can cause duplicates.

What's Next

See Kafka Delivery Semantics for at-least-once, at-most-once, exactly-once. See Ordering vs Throughput for when to favor partitioning vs batching. See Idempotency in Message Consumers for safe consumption.