VOOZH about

URL: https://www.geeksforgeeks.org/dsa/count-distinct-substrings-string-using-suffix-array/

⇱ Count of distinct substrings of a string using Suffix Array - GeeksforGeeks


  • Courses
  • Tutorials
  • Interview Prep

Count of distinct substrings of a string using Suffix Array

Last Updated : 23 Jul, 2025

Given a string of length n of lowercase alphabet characters, we need to count total number of distinct substrings of this string. 

Examples: 

Input : str = “ababa”
Output : 10
Total number of distinct substring are 10, which are,
"", "a", "b", "ab", "ba", "aba", "bab", "abab", "baba"
and "ababa"

We have discussed a Suffix Trie based solution in below post : 
Count of distinct substrings of a string using Suffix Trie

We can solve this problem using suffix array and longest common prefix concept. A suffix array is a sorted array of all suffixes of a given string. 
For string “ababa” suffixes are : “ababa”, “baba”, “aba”, “ba”, “a”. After taking these suffixes in sorted form we get our suffix array as [4, 2, 0, 3, 1] 
Then we calculate lcp array using kasai’s algorithm. For string “ababa”, lcp array is [1, 3, 0, 2, 0]
After constructing both arrays, we calculate total number of distinct substring by keeping this fact in mind : If we look through the prefixes of each suffix of a string, we cover all substrings of that string. 

We will explain the procedure for above example,

String = “ababa”
Suffixes in sorted order : “a”, “aba”, “ababa”,
 “ba”, “baba”
Initializing distinct substring count by length
of first suffix, 
Count = length(“a”) = 1 
Substrings taken in consideration : “a”

Now we consider each consecutive pair of suffix, 
lcp("a", "aba") = "a".
All characters that are not part of the longest 
common prefix contribute to a distinct substring. 
In the above case, they are 'b' and ‘a'. So they 
should be added to Count.
Count += length(“aba”) - lcp(“a”, “aba”) 
Count = 3 
Substrings taken in consideration : “aba”, “ab”

Similarly for next pair also,
Count += length(“ababa”) - lcp(“aba”, “ababa”)
Count = 5
Substrings taken in consideration : “ababa”, “abab”

Count += length(“ba”) - lcp(“ababa”, “ba”)
Count = 7
Substrings taken in consideration : “ba”, “b”

Count += length(“baba”) - lcp(“ba”, “baba”)
Count = 9
Substrings taken in consideration : “baba”, “bab”

We finally add 1 for empty string.
count = 10

Implementation:


Output
10

Time Complexity : O(nlogn), where n is the length of string.
Auxiliary Space : O(n), where n is the length of string.

This article is contributed by Utkarsh Trivedi<.  

Comment