Hashing refers to the process of generating a small sized output (that can be used as index in a table) from an input of typically large and variable size. Hashing uses mathematical formulas known as hash functions to do the transformation. This technique determines an index or location for the storage of an item in a data structure called Hash Table.
It is one of the most widely used data structure after arrays.
It mainly supports search, insert and delete in O(1) time on average which is more efficient than other popular data structures like arrays, Linked List and Self Balancing BST.
We use hashing for dictionaries, frequency counting, maintaining data for quick access by key, etc.
Real World Applications include Database Indexing, Cryptography, Caches, Symbol Table and Dictionaries.
Need to maintain sorted data along with search, insert and delete. We use a self balancing BST in these cases.
When Strings are keys and we need operations like prefix search along with search, insert and delete. We use Trie in these cases.
When we need operations like floor and ceiling along with search, insert and/or delete. We use Self Balancing BST in these cases.
Components of Hashing
There are majorly three components of hashing:
Key: A Keycan be anything string or integer which is fed as input in the hash function the technique that determines an index or location for storage of an item in a data structure.
Hash Function: Receives the input key and returns the index of an element in an array called a hash table. The index is known as the hash index .
Hash Table: Hash table is typically an array of lists. It stores values corresponding to the keys. Hash stores the data in an associative manner in an array where each data value has its own unique index.
Suppose we have a set of strings {“ab”, “cd”, “efg”} and we would like to store it in a table.
Step 1: We know that hash functions (which is some mathematical formula) are used to calculate the hash value which acts as the index of the data structure where the value will be stored.
Step 2: So, let's assign
“a” = 1,
“b”=2, .. etc, to all alphabetical characters.
Step 3: Therefore, the numerical value by summation of all characters of the string:
“ab” = 1 + 2 = 3,
“cd” = 3 + 4 = 7 ,
“efg” = 5 + 6 + 7 = 18
Step 4: Now, assume that we have a table of size 7 to store these strings. The hash function that is used here is the sum of the characters in key mod Table size . We can compute the location of the string in the array by taking the sum(string) mod 7 .
The above technique enables us to calculate the location of a given string by using a simple hash function and rapidly find the value that is stored in that location. Therefore the idea of hashing seems like a great way to store (key, value) pairs of the data in a table.