Internal concepts used in HashMap

Before going to read this article, If you want to cover basics of HashMap please check article article_link.

In this article, we will understand internal concepts used in HashMap.

Important internal concepts

Hash function, hashing and index calculation
Bucket assignment, Linked List and Tree
Hash collision
hashcode() and equals() method
Resizing and rehashing
Initial capacity
Load factor

Hash function, Hashing and index calculation

Hashing is a technique which takes variable size input and convert it into fix size value output using some mathematical formula. The output we call as hashcode.
In HashMap, hashing is used to find index to store and read key value pair in Bucket Array.
key is passed as input to hashing function to calculate hashcode and this hashcode further use to calculate index in Bucket Array.
In HashMap class, hashcode of keys calculates using hash() method, This method logic in Java version 8 is
- (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16)
- Here it checks if key is null, in that case it returns 0 as hashcode.
- If key is not null, then hascode of key is calculated and this generated hashcode is further xor with 16 bit shifting of itself, as mentioned in above logic.
- (h >>> 16), This 16 bit shifting and xor operation helps to evenly distribute the keys over entire bucket array and lower the chances of having more than 1 element within a single bucket, in result chances of collision will be less.
- Also, if hashCode() method of key is not strong then this 16 bit shifting will help to make it strong.
Using hash, index calculates according to below logic.
- index = (Bucket Array Capacity - 1) & hash
- It means if bucket array capacity increase or decrease, index for a key also changes even hashcode for key remain same. so, during resizing process key's index in bucket array gets change and not remain same.

Bucket Assignment, Linked List and Tree

Once index is calculated using hashcode, HashMap put it into calculated index of bucket array.
There may be chances that more than 1 element gets same bucket, then HashMap use linked list to store them.
If number of elements grow more than 8 (TREEIFY_THRESHOLD), then map converts linked list to Tree. This tree is self-balancing binary search tree (example : red black tree).
- Why tree ? This makes search operation faster, in case of tree time complexity of search operation is O(logn), which is faster then O(n).
Related to tree in HashMap there are some constants, below are their details.
What is TREEIFY_THRESHOLD and UNTREEIFY_THRESHOLD in HashMap ?
- In HashMap class there are 2 constants TREEIFY_THRESHOLD and UNTREEIFY_THRESHOLD
- TREEIFY_THRESHOLD : It is a constant, if number of elements in a Bucket becomes greater or equal than this threshold value then LinkedList converts into tree.
  - Usually we convert linked list to tree when number of elements increases in a bucket, so we can say grow phase.
- UNTREEIFY_THRESHOLD : It is constant, if number of elements in Bucket becomes less than or equal to this threshold then tree converts back to LinkedList.
  - This happens when number of elements in a bucket decrease because of resize() or removal(), this we can call it shrink phase.
There is one more constant here , MIN_TREEIFY_CAPACITY
- It means that if the number of elements in bucket array is greater than or equal to MIN_TREEIFY_CAPACITY then only it is allowed to convert Linked List to tree, else only resize operation will perform.
- so, It is important to note that linked list only converts to tree if condition as mentioned in above point meets.
Also, please take a note that values of these constant (TREEIFY_THRESHOLD, UNTREEIFY_THRESHOLD, MIN_TREEIFY_CAPACITY) we cannot configure or change unlike loadfactor or intitial capacity, as these are static final int.
Just to avoid any confusion lets understand TREEIFY_THRESHOLD and MIN_TREEIFY_CAPACITY, so MIN_TREEIFY_CAPACITY checks the number of elements over entire Bucket Array and not single bucket. so number of elements in bucket array is greater than or equal to MIN_TREEIFY_CAPACITY then only conversion of linked list to tree allowed. Once it is allowed then only for each bucket it checks if bucket have >= TREEIFY_THRESHOLD element , if yes then it converts linked list to tree else nothing it will do.

Hash Collision

Collision in hashmap happens, when putting an element in a bucket where already element is present.
Collision can happen
- If two key are different but their hashcode calculated is same.
  - This can be possible because hash is of type int, which can hold only 32-bit integer. Any other primitive type hash may result into same.
- If two keys are different , their hashcode is also different but the index allocated to store them in bucket array is same.
If Collision happens frequently then this may impact the time complexity of HashMap.
- As more number of elements present within a bucket, so to get key value using get() method, It checks all elements present in that bucket using equals() and it result in time complexity of O(n) in case of LinkedList or O(logn) in case of Tree, instead of O(1)
So, to prevent collision, HashMap class already have below functionalities, but we may change some of them according to our requirement.
- Good Hashing, Hash of keys calculated in such a way that they distribute evenly over the entire bucket array.
- Resizing and rehashing operation happens, which increase the capacity of HashMap and decrease the likelihood of collision.
- In case collision happen, Key value pair is managed in a bucket using datastructure like Tree and LinkedList.
- Initital Bucket size, it is important constant. thorugh this we can change initial size of HashMap. if size is large then chances of collision will be less, but we should take care about space complexity.
- load factor also plays important role to prevent collision, by default its value is 0.75. but we can change it according to our requirement. so whenever the number of key value pair reaches 75 % of its Initial capacity, then resize and rehashing operation happens. which helps to redistribute all keys over entire bucket array.

`hashcode()` and `equals()` methods

HashMap use keys's hashcode() method to calculate hash, this hash further use in case of reading and writing data in Map.
equals() method also use while reading and writing data into map.
Let's understand this, while calling get(K Key) method of HashMap, HashMap gets key as input, using key's hashcode hash calculates and further index calculates, let say on calculated index in bucket array there is multiple key value pair. Then using equals(), HashMap checks which key matching with input key. And if founds, it return value of that key.
let's understand these methods, in the case of put(K key, V value). when we use the put(K key, V value) operation to insert a key-value pair in the hashmap, the hashcode method will be used to find the hash of the key and using that will calculate the index. let's say the calculated index in the bucket array already has some key-value pairs, in that case equal method will be used. if inserting key matches with any existing key using the equals method then that key value will update else new key value pair will be inserted into the bucket.
It is important to note that in HashMap keys are unique.
If two objects are equal using equal() method then it is must that their hashcode() result also equals, but viceversa not true. that is why even after calculate hash , HashMap checks using equals() method which key actually matching, in case bucket have more than element.

Resizing and rehashing

Resizing and Rehashing helps to expand size of BucketArray and redistribute key value pair evenly over the entire array.
Also it is important to note that resizing involve only increase in the size of Bucket Array, there is no option to shrink it when elements remove from it.
When we try to put element , if this insertion will make below condition true then resizing and rehashing performs.
- (number of elements in BucketArray) > (INITIAL_CAPACITY * load factor)
As we already understand resizing expand size of hashmap, but we also need to redistribute existing elements, for this we need rehashing.
- In rehashing, we need to recalculate index
  - index = (Bucket Array Capacity - 1) & hash

Initial capacity

Initial Capacity is the important parameter of HashMap and we can change its value during initialization of HashMap object.
It helps to initialize Bucket array, means if initial capacity is 16 then it will make Buckey Array of size 16 in HashMap.
By default its value is 16.
We can also changes its value using constructor.
- for Example : new HashMap(3)
When we change its value, let say 3
- Then it will find its nearest power of 2, which must greater then or equal to it. For 3 it will 4, for 6 it will be 8.
- And Creates and array of size power of 2.
- so, for intial capacity 3 Bucket Array size will be 4.

Load factor

Load factor is important parameter in HashMap that helps to maintain a balance between space and time complexity.
we can change its value during initialization of HashMap object.
- Example : new HashMap(initialCapacity,loadFactor)
Preferable load factor as provided with Java 8 is 0.75.
- It means if number of elements in Bucket Array is 75% of Initial capacity then resize will happen and thus provide constant average time complexity of put and get operation.
Load factor is of type float and must be > 0.0. also, it is good if we not keep its value greater than 1.0.
- let say you will try to set load factor 0 then you will get below runtime exception. java.lang.IllegalArgumentException: Illegal load factor: 0.0.
- let say instead of 0.0, you try load factor as 0.01.
  - Then It will increase time complexity of put() method. As for every put, resize will happen. This resize will also increase the size of BucketArray even if it's not require.
- If you will try to apply load factor as 1 then below logic will look like.
  - (number of elements in BucketArray) > (INITIAL_CAPACITY * 1.0)
  - (number of elements in BucketArray) > (INITIAL_CAPACITY )
  So, we can conclude that whenever number of elements in bucket array is greater than INITITAL_CAPACITY then resize and rehashing will happen.
- we can also try to apply load factor > 1 as in Java version 8 there is no check or validation for it, but in that case time complexity will suffer.
  - Let's say if we are using load factor as 2.0, then (number of elements in BucketArray) > (INITIAL_CAPACITY * 2.0)
  - It means that if initial capacity is 16 then resizing will happen once number of elements in Bucket Array reach greater than 32.
  - This will increase the number of elements in a single bucket within bucket array and frequency of collision will high.
  - And whenever try to use get(). it will search over all element within a single bucket array and increase time complexity of get() method.

Note : Please comment below for any improvement , suggestions, regarding any wrong information present in this article. We will improve it so that only right information available in this article.

CoderStuff01

Search This Blog

Internal concepts used in HashMap

Important internal concepts

Hash function, Hashing and index calculation

Bucket Assignment, Linked List and Tree

Hash Collision

`hashcode()` and `equals()` methods

Resizing and rehashing

Initial capacity

Load factor

Comments