VOOZH about

URL: https://www.geeksforgeeks.org/dsa/suffix-arrays-for-competitive-programming/

⇱ Suffix Arrays for Competitive Programming - GeeksforGeeks


  • Courses
  • Tutorials
  • Interview Prep

Suffix Arrays for Competitive Programming

Last Updated : 23 Jul, 2025

A suffix array is a sorted array of all suffixes of a given string. More formally if you are given a string 'S' then the suffix array for this string contains the indices 0 to n, such that the suffixes starting from these indices are sorted lexicographically.

👁 suffix-array-competetive-programme

Example:

Input: banana

0 banana 5 a
1 anana Sort the Suffixes 3 ana
2 nana ----------------> 1 anana
3 ana alphabetically 0 banana
4 na 4 na
5 a 2 nana

So the suffix array for "banana" is {5, 3, 1, 0, 4, 2}

:

  1. Naive way to construct suffix array
  2. Using Radix Sort to construct suffix array in O(n * Log(n))

:

:

Problem: Given a string 'S' and a string 'T' determine whether the string T is a substring of S, if so return the index at which T is a substring of S.

Input: S = "bannana" , T = "nan"
Output: 3

In O(|S| * |T|) we can iterate on each index of 'T' and then compare whether the substring starting at that index matches 'S' or not.

We can notice that any substring is a prefix of some suffix. In the suffix array for string 'S' we cut off the first |T| characters of each suffix and get all the substring of length atmost |T| in a sorted order. In order to find S we can simply apply binary search and compare the mid string to string S.

  • If mid string of suffix array is lexicographically smaller than 'T' then binary search on right half.
  • If mid string of suffix array is lexicographically greater than 'T' then binary search on left half.
  • If both the string match return that index as our result.

: O(|S| * log(|S|) + |T| * log(|S|) ), where O(|S| * log(|S|)) is to construct suffix array for string S and O(|T| * log(|S|)) is to search and compare string T.

:

: Given a string 'S' and Q queries of the form {i, j}. Find the LCP(i, j) i.e. length of the Longest Common Prefix(LCP) for the suffixes starting at index i and j.

:

Input: S = "banana" , Query = {{0, 5}, {4, 2}, {1, 3}}
Output: 0 2 3
Explanation: Query[0] = {0, 5} = LCP (banana, a) = ' ' = 0
Query[1] = {4, 2} = LCP (na, nana) = 'na' = 2
Query[2] = {1, 3} = LCP (anana, ana) = 'ana' = 3

: For each query we can we can compare both the suffixes starting from i and j in O(|S|) thus giving us a total time complexity of O(Q*|S| )

: Let our suffix array be Suffix[], in order to solve the problem let us construct an array lcp[] such that lcp[i] = LCP(Suffix[i], Suffix[i+1]). In simple language the lcp[] array stores the Longest common prefix of adjacent indices in suffix array as shown in the below image for string S = "banana".

👁 Construction-Of-LCP-array

Now in order to calculate LCP(i, j) just find the position of i and j in suffix array and calculate the minimum value in range lcp[Suffix[i]] to lcp[Suffix[j]-1].

👁 suffix-array

Proof: Let LCP(i, j) = k , since the Suffixes are sorted in Lexicographical order, therefore each suffix from Suffix[i] to Suffix[j] will have atleast k common characters at string, So all lcp from i to j is not less than k and therefore the minimum on this segement is not less than k. On the other hand, it cannot be greater than k, since this means that each pair of suffixes has more than k common characters, which means that i and j must have more than k common characters.

Note: Interestingly we can construct a sparse table in order to answer each query in O(1).
How to construct the lcp[] array in O(N)

: O((|S| * log|S|) + Q)

:

Problem: Given a string 'S', the task is to find the total number of unique substrings of S.

Example:

Input: S='abab'
Output: 7
Explanation: Unique substrings of "abab" = {"abab","aba","ab","a","bab","ba","b"}

: As we know that any substring is a prefix of some suffix. In order to calculate the total number of distinct substrings we can iterate the suffix array (where suffixes are sorted) ,the total number of prefixes is equal to the length of the suffix. In order to find out which of them have already occurred in the previous suffixes, we just need to subtract the LCP of this suffix with the previous one.

The below image shows how to calculate number of distinct substrings for the string "BANANA" using suffix and lcp array.

👁 calculating


Comment
Article Tags:
Article Tags: