![]() |
VOOZH | about |
Given a string of length n of lowercase alphabet characters, we need to count total number of distinct substrings of this string.
Examples:
Input : str = “ababa” Output : 10 Total number of distinct substring are 10, which are, "", "a", "b", "ab", "ba", "aba", "bab", "abab", "baba" and "ababa"
We have discussed a Suffix Trie based solution in below post :
Count of distinct substrings of a string using Suffix Trie
We can solve this problem using suffix array and longest common prefix concept. A suffix array is a sorted array of all suffixes of a given string.
For string “ababa” suffixes are : “ababa”, “baba”, “aba”, “ba”, “a”. After taking these suffixes in sorted form we get our suffix array as [4, 2, 0, 3, 1]
Then we calculate lcp array using kasai’s algorithm. For string “ababa”, lcp array is [1, 3, 0, 2, 0]
After constructing both arrays, we calculate total number of distinct substring by keeping this fact in mind : If we look through the prefixes of each suffix of a string, we cover all substrings of that string.
We will explain the procedure for above example,
String = “ababa”
Suffixes in sorted order : “a”, “aba”, “ababa”,
“ba”, “baba”
Initializing distinct substring count by length
of first suffix,
Count = length(“a”) = 1
Substrings taken in consideration : “a”
Now we consider each consecutive pair of suffix,
lcp("a", "aba") = "a".
All characters that are not part of the longest
common prefix contribute to a distinct substring.
In the above case, they are 'b' and ‘a'. So they
should be added to Count.
Count += length(“aba”) - lcp(“a”, “aba”)
Count = 3
Substrings taken in consideration : “aba”, “ab”
Similarly for next pair also,
Count += length(“ababa”) - lcp(“aba”, “ababa”)
Count = 5
Substrings taken in consideration : “ababa”, “abab”
Count += length(“ba”) - lcp(“ababa”, “ba”)
Count = 7
Substrings taken in consideration : “ba”, “b”
Count += length(“baba”) - lcp(“ba”, “baba”)
Count = 9
Substrings taken in consideration : “baba”, “bab”
We finally add 1 for empty string.
count = 10
Implementation:
10
Time Complexity : O(nlogn), where n is the length of string.
Auxiliary Space : O(n), where n is the length of string.
This article is contributed by Utkarsh Trivedi<.