HashMap Deep Dive - Resize, Collisions, Treeify

Understand how Java's HashMap works internally: its array-and-bucket structure, hash distribution, collision handling with linked lists and red-black trees, resize behavior, and treeification conditions. This article explains the mechanisms behind HashMap and how to use it effectively in real systems.

Overview

HashMap stores key-value pairs in an array of buckets. Each bucket can hold a linked list of nodes (or a red-black tree when the list grows long). The structure enables average O(1) get and put operations when the hash function distributes keys well, but understanding internals helps you avoid performance pitfalls and choose the right data structure for concurrent access.

Key concepts:

Hash and bucket index: hash = (h = key.hashCode()) ^ (h >>> 16); then index = (n - 1) & hash where n is the table length (a power of 2). The XOR with the high 16 bits reduces collisions when the lower bits are similar.
Collisions: Keys that map to the same bucket are chained in a linked list. In Java 8+, when a bucket's list exceeds a threshold, it is converted to a red-black tree to avoid O(n) lookups.
Resize: When size > threshold (threshold = capacity × loadFactor, default 0.75), the table doubles in size and all entries are rehashed.

Example

Basic put and get

Java
Map<String, Integer> map = new HashMap<>();
map.put("a", 1);
map.put("b", 2);
map.put("c", 3);
int v = map.get("a");  // 1
// Initial capacity is 16, threshold is 12; the 13th put triggers resize

Collision and chaining

When multiple keys produce the same bucket index, they form a chain. In Java 8, new entries are appended at the end of the list (or inserted into the tree). A poor hashCode() that clusters keys in few buckets degrades performance.

Java
// Keys with similar hashCodes may land in the same bucket
Map<BadKey, String> map = new HashMap<>();
// If BadKey.hashCode() always returns the same value, all entries chain in one bucket

Treeification

When a bucket's list reaches length 8 and the table has at least 64 buckets, that bucket is converted to a red-black tree. This avoids O(n) lookups when many keys collide (e.g. due to a weak hash or a deliberate attack). When the tree shrinks to 6 or fewer nodes, it is converted back to a list.

Java
// With 8+ colliding keys in one bucket, the bucket becomes a tree
Map<Integer, String> map = new HashMap<>();
// Use keys that deliberately collide to observe treeification (e.g. same lower bits)
for (int i = 0; i < 10; i++) {
    map.put(i * 16, "value" + i);  // many of these may land in same bucket
}

Hash and Bucket Index

The hash is computed as:

Java
int hash = (h = key.hashCode()) ^ (h >>> 16);
int index = (n - 1) & hash;

The high 16 bits are XORed with the low 16 bits so that variations in the high bits affect the lower bits used for indexing. This improves distribution when n is small.
The table length n is always a power of 2, so (n - 1) & hash is equivalent to hash % n but faster.

Resize Behavior

When size > threshold:

Allocate a new table with twice the capacity.
Rehash every entry: each node either stays at the same index or moves to index + oldCapacity. Because the new capacity doubles, one extra bit decides the new bucket; nodes split evenly.
If the old bucket was a tree and splits into two, each half may be converted back to a list if it has 6 or fewer nodes.

Resize is expensive (O(n)), so if you know the expected size, use new HashMap<>(expectedSize) or slightly larger to reduce resizes. The default load factor 0.75 balances space and collision probability.

Treeification and Untreeify

Condition	Action
Bucket list length ≥ 8 and table length ≥ 64	Convert bucket to red-black tree
Bucket tree size ≤ 6 (after split or removal)	Convert back to linked list

Treeification avoids worst-case O(n) lookups when many keys collide. The threshold of 8 is chosen based on statistical analysis of hash distributions; the 6-node untreeify threshold avoids frequent convert-back-and-forth when the size fluctuates around 8.

Key Rules

Implement hashCode() and equals() correctly — equals implies hashCode equality; unequal objects should ideally have different hash codes.
Prefer immutable keys — mutable keys changed after insertion can break the map (wrong bucket, lost entries).
Initial capacity — use new HashMap<>(expectedSize) to avoid repeated resizes when the size is known.
Thread safety — HashMap is not thread-safe. Use ConcurrentHashMap for concurrent access.

What's Next

For the concurrent version, see ConcurrentHashMap Internals. For collection overview, see Java Collections in Practice.