![]() |
VOOZH | about |
All data structure has their own special characteristics, for example, a BST is used when quick searching of an element (in log(n)) is required. A heap or a priority queue is used when the minimum or maximum element needs to be fetched in constant time. Similarly, a hash table is used to fetch, add and remove an element in constant time. Anyone must be clear with the working of a hash table before moving on to the implementation aspect. So here is a brief background on the working of a hash table, and also it should be noted that we will be using Hash Map and Hash Table terminology interchangeably though in Java HashTables are thread-safe while HashMaps are not.
The code we are going to implement is available at Link 1 and Link2
But it is strongly recommended that one must read this blog completely and try and decipher the nitty-gritty of what goes into implementing a hash map and then try to write the code yourself.
Background: Every hash-table stores its data in the form of a (key, value) combination. Interestingly every key is unique in a Hash Table but values can repeat which means values can be the same for different keys present in it. Now as we observe in an array to fetch a value we provide the position/index corresponding to the value in that array. In a Hash Table, instead of an index, we use a key to fetch the value corresponding to that key. Now the entire process is described below
Every time a key is generated. The key is passed to a hash function. Every hash function has two parts a Hash code and a Compressor.
Hash code is an Integer number (random or non-random). In Java, every object has its own hash code. We will use the hash code generated by JVM in our hash function and compress the hash code we modulo(%) the hash code by the size of the hash table. So modulo operator is a compressor in our implementation.
The entire process ensures that for any key, we get an integer position within the size of the Hash Table to insert the corresponding value.
So the process is simple, the user gives a (key, value) pair set as input, and based on the value generated by the hash function an index is generated to where the value corresponding to the particular key is stored. So whenever we need to fetch a value corresponding to a key, that is just O(1).
This picture stops being so rosy and perfect when the concept of a hash collision is introduced. Imagine for different key values same block of the hash table is allocated now where do they previously store values corresponding to some other previous key go. We certainly can’t replace it. That will be disastrous! To resolve this issue we will use the Separate Chaining Technique, Please note there are other open addressing techniques like double hashing and linear probing whose efficiency is almost the same as that of separate chaining, and you can read more about them at Link 1Link 2Link3
Now what we do is make a linked list corresponding to the particular bucket of the Hash Table, to accommodate all the values corresponding to different keys that map to the same bucket.
Now there may be a scenario where all the keys get mapped to the same bucket, and we have a linked list of n(size of the hash table) size from one single bucket, with all the other buckets empty and this is the worst case where a hash table acts a linked list and searching is O(n).
So what do we do?
Load Factor: If n be the total number of buckets we decided to fill initially say 10 and let’s say 7 of them got filled now, so the load factor is 7/10=0.7.
In our implementation whenever we add a key-value pair to the Hash Table we check the load factor if it is greater than 0.7 we double the size of our hash table.
We will try to make a generic map without putting any restrictions on the data type of the key and the value. Also, every hash node needs to know the next node it is pointing to in the linked list so a next pointer is also required.
The functions we plan to keep in our hash map are as follows:
ArrayList<HashNode<K, V>> bucket = new ArrayList<>();
A Helper Function is implemented to get the index of the key, to avoid redundancy in other functions like get, add and remove. This function uses the inbuilt java function to generate a hash code, and we compress the hash code by the size of the HT so that the index is within the range of the size of the HT
get():
The get function just takes a key as an input and returns the corresponding value if the key is present in the table otherwise returns null. Steps are:
remove():
add():
Now to the most interesting and challenging function of this entire implementation. It is interesting because we need to dynamically increase the size of our list when the load factor is above the value we specified.
Java does in its own implementation of Hash Table uses Binary Search Tree if linked list corresponding to a particular bucket tends to get too long.
3 4 null 2 false
This method adds a key value pair to the hash table. The time complexity of this method is O(1) because it is constant time. The space complexity is O(n) because it will increase with the amount of items stored in the hash table.
This method removes a given key from the hash table. The time complexity of this method is O(1) because it is constant time. The space complexity is O(1) because it does not depend on the amount of items stored in the hash table.
This method returns the value for a given key from the hash table. The time complexity of this method is O(1) because it is constant time. The space complexity is O(1) because it does not depend on the amount of items stored in the hash table.
The time complexity is constant because it simply returns the size of the hash table. The space complexity is constant because it does not require any additional space.