HashMap Deep Dive - Resize, Collisions, Treeify
Understand how Java's HashMap works internally: its array-and-bucket structure, hash distribution, collision handling with linked lists and red-black trees, resize behavior, and treeification conditions. This article explains the mechanisms behind HashMap and how to use it effectively in real systems.
Overview
HashMap stores key-value pairs in an array of buckets. Each bucket can hold a linked list of nodes (or a red-black tree when the list grows long). The structure enables average O(1) get and put operations when the hash function distributes keys well, but understanding internals helps you avoid performance pitfalls and choose the right data structure for concurrent access.
Key concepts:
- Hash and bucket index:
hash = (h = key.hashCode()) ^ (h >>> 16); thenindex = (n - 1) & hashwherenis the table length (a power of 2). The XOR with the high 16 bits reduces collisions when the lower bits are similar. - Collisions: Keys that map to the same bucket are chained in a linked list. In Java 8+, when a bucket's list exceeds a threshold, it is converted to a red-black tree to avoid O(n) lookups.
- Resize: When
size > threshold(threshold = capacity × loadFactor, default 0.75), the table doubles in size and all entries are rehashed.
Example
Basic put and get
JavaMap<String, Integer> map = new HashMap<>(); map.put("a", 1); map.put("b", 2); map.put("c", 3); int v = map.get("a"); // 1 // Initial capacity is 16, threshold is 12; the 13th put triggers resize
Collision and chaining
When multiple keys produce the same bucket index, they form a chain. In Java 8, new entries are appended at the end of the list (or inserted into the tree). A poor hashCode() that clusters keys in few buckets degrades performance.
Java// Keys with similar hashCodes may land in the same bucket Map<BadKey, String> map = new HashMap<>(); // If BadKey.hashCode() always returns the same value, all entries chain in one bucket
Treeification
When a bucket's list reaches length 8 and the table has at least 64 buckets, that bucket is converted to a red-black tree. This avoids O(n) lookups when many keys collide (e.g. due to a weak hash or a deliberate attack). When the tree shrinks to 6 or fewer nodes, it is converted back to a list.
Java// With 8+ colliding keys in one bucket, the bucket becomes a tree Map<Integer, String> map = new HashMap<>(); // Use keys that deliberately collide to observe treeification (e.g. same lower bits) for (int i = 0; i < 10; i++) { map.put(i * 16, "value" + i); // many of these may land in same bucket }
Hash and Bucket Index
The hash is computed as:
Javaint hash = (h = key.hashCode()) ^ (h >>> 16); int index = (n - 1) & hash;
- The high 16 bits are XORed with the low 16 bits so that variations in the high bits affect the lower bits used for indexing. This improves distribution when
nis small. - The table length
nis always a power of 2, so(n - 1) & hashis equivalent tohash % nbut faster.
Resize Behavior
When size > threshold:
- Allocate a new table with twice the capacity.
- Rehash every entry: each node either stays at the same index or moves to
index + oldCapacity. Because the new capacity doubles, one extra bit decides the new bucket; nodes split evenly. - If the old bucket was a tree and splits into two, each half may be converted back to a list if it has 6 or fewer nodes.
Resize is expensive (O(n)), so if you know the expected size, use new HashMap<>(expectedSize) or slightly larger to reduce resizes. The default load factor 0.75 balances space and collision probability.
Treeification and Untreeify
| Condition | Action |
|---|---|
| Bucket list length ≥ 8 and table length ≥ 64 | Convert bucket to red-black tree |
| Bucket tree size ≤ 6 (after split or removal) | Convert back to linked list |
Treeification avoids worst-case O(n) lookups when many keys collide. The threshold of 8 is chosen based on statistical analysis of hash distributions; the 6-node untreeify threshold avoids frequent convert-back-and-forth when the size fluctuates around 8.
Key Rules
- Implement
hashCode()andequals()correctly —equalsimplieshashCodeequality; unequal objects should ideally have different hash codes. - Prefer immutable keys — mutable keys changed after insertion can break the map (wrong bucket, lost entries).
- Initial capacity — use
new HashMap<>(expectedSize)to avoid repeated resizes when the size is known. - Thread safety —
HashMapis not thread-safe. UseConcurrentHashMapfor concurrent access.
What's Next
For the concurrent version, see ConcurrentHashMap Internals. For collection overview, see Java Collections in Practice.