VOOZH about

URL: https://www.geeksforgeeks.org/dsa/string-hashing-using-polynomial-rolling-hash-function/

⇱ String hashing using Polynomial rolling hash function - GeeksforGeeks


  • Courses
  • Tutorials
  • Interview Prep

String hashing using Polynomial rolling hash function

Last Updated : 28 Apr, 2025

Given a string str of length n, your task is to find its hash value using polynomial rolling hash function.

Note: If two strings are equal, their hash values should also be equal. But the inverse need not be true.

Examples:

Input: str = "geeksforgeeks"
Output: 609871790

Input: str = "polynomial"
Output: 948934983

What is Hash Function ?

A Hash function is a function that maps any kind of data of arbitrary size to fixed-size values. The values returned by the function are called Hash Values or digests.

There are many popular Hash Functions such as DJBX33A, MD5, and SHA-256. In this article we have discussed the key features, implementation, advantages and drawbacks of the Polynomial Rolling Hash Function.

The Polynomial Rolling Hash Function

Polynomial rolling hash function is a hash function that uses only multiplications and additions. The following is the function:

or simply,

Where

  • The input to the function is a string  of length .
  •  and  are some positive integers.
  • The choice of  and  affects the performance and the security of the hash function.
  • If the string  consists of only lower-case letters, then  is a good choice.
    • Competitive Programmers prefer using a larger value for . Examples include .
  •  shall necessarily be a large prime since the probability of two keys colliding (producing the same hash) is nearly  and  are widely used values for .
  • The output of the function is the hash value of the string  which ranges between  and  inclusive.

Below is the implementation of the Polynomial Rolling Hash Function:


Output
609871790

Time Complexity: O(n)
Auxiliary Space: O(1)

Collisions in Polynomial Rolling Hash

Since the output of the Hash function is an integer in the range , there are high chances for two strings producing the same hash value.

For instance, the strings  and  produce the same hash value for  and .

Also, the strings  and   produce the same hash value for  and .

We can guarantee a collision within a very small domain. Consider a set of strings, , consisting of only lower-case letters, such that the length of any string in  doesn't exceed .

We have . Since the range of the Hash Function is , one-one mapping is impossible. Hence, we can guarantee a collision by arbitrarily generating two strings whose length doesn't exceed .

Collision Resolution

We can note that the value of  affects the chances of collision. We have seen that the probability of collision is . We can increase the value of  to reduce the probability of collision. But that affects the speed of the algorithm. Larger the value of , the slower the algorithm. Also, some languages (C, C++, Java) have a limit on the size of the integer. Hence, we can't increase the value of  to a very large value.

Then how can we minimise the chances of a collision?

Note that the hash of a string depends on two parameters:  and .

We have seen that the strings  and  produce the same hash value for  and . But for  and , they produce different hashes.

Observation

If two strings produce the same hash values for a pair , they will produce different hashes for a different pair, .

Strategy

We cannot, however, nullify the chances of collision because there are infinitely many strings. But, surely, we can reduce the probability of two strings colliding.

We can reduce the probability of collision by generating a pair of hashes for a given string. The first hash is generated using  and , while the second hash is generated using  and .

Why will this work?

We are generating two hashes using two different modulo values,  and . The probability of a collision is now . Since both  and  are greater than , the probability that a collision occurs is now less than  which is so much better than the original probability of collision, .

Below is given the implementation:


Output
609871790 642799661

Time Complexity: O(n)
Auxiliary Space: O(1)

Features of Polynomial rolling hash function

  • Calculation of Hashes of any substring of a given string in constant time

Note that computing the hash of the string S will also compute the hashes of all of the prefixes. We just have to store the hash values of the prefixes while computing. Say \text{hash[i]} denotes the hash of the prefix \text{S[0...i]}, we have

This allows us to quickly compute the hash of the substring  in  provided we have powers of  ready.

  • The behavior of the hash when a character is changed

Recall that the hash of a string  is given by

Say, we change a character  at some index  to some other character . How will the hash change?

If  denotes the hash value before changing and  is the hash value after changing, then the relation between them is given by

Therefore, queries can be performed very quickly instead of recalculating the hash from beginning, provided we have the powers of  ready.

Below is given the implementation:


Output
609871790 642799661

Applications

Given a sequence S of N strings and Q queries. In each query, you are given two indices, i and j, your task is to find the length of the longest common prefix of the strings S[i] and S[j].

Before getting into the approach to solve this problem, note that the constraints are:

Using Hashing, the problem can be solved in O(N + Q/log|S|_{max}). The approach is to compute hashes for all the strings in O(N) time, Then for each query, we can binary search the length of the longest common prefix using hashing.

Below is given the implementation:


Output
5 0 4 8 
Comment