![]() |
VOOZH | about |
You are given a string of length N. For every integer between 1 to N you need to print the number of distinct substrings of that length.
Prerequisites to Solve this problem: Suffix arrays, LCP
Examples:
Input: str="abab"
Output: 2 2 2 1
Explanation:
- for length n=1, distinct substrings: {a, b}
- for length n=2, distinct substrings: {ab, ba}
- for length n=3, distinct substrings: {aba, bab}
- for length n=4, distinct substrings: {abab}
Input: str="abc"
Output: 3 2 1
Explanation:
- for length n=1, distinct substrings: {a, b, c}
- for length n=2, distinct substrings: {ab, bc}
Suffix Arrays: A suffix array is a sorted array of all suffixes of a given string. The suffixes are usually represented as integers, which are the starting indices of the suffixes in the string.
Longest Common Prefix (LCP) Array: The LCP array is an auxiliary data structure that stores the length of the longest common prefix between any two consecutive suffixes in the sorted suffix array.
The below image shows how Suffix array and LCP array look for a string "BANANA"
👁 Image
Key Observations:
N - Suffix[i] represents the total number of substrings that do not start from index i (including substrings that start from previous suffixes).LCP[i] represents the number of substrings shared with the next suffix. Subtracting LCP[i] from N - Suffix[i] removes the common substrings shared with the next suffix, leaving only the unique substrings contributed by the suffix starting at index iSuffix[i] = N - Suffix[i] - LCP[i] represents the number of unique substrings contributed by the suffix starting at index i.So, for each suffix in the suffix array, you can calculate the range of lengths of unique substrings it contributes, and update the count of substrings of those lengths accordingly.
Building Suffix Array: The buildSuffixArray function is used to build the suffix array for the input string. It starts by initializing the suffix array and the position array. Then it sorts the suffix array based on the characters at the current gap. The gap is initially 1 and is doubled in each iteration. The position array is updated in each iteration based on the sorted suffix array.
Building LCP Array: The buildLCPArray function is used to build the LCP array for the input string and its suffix array. It iterates over the suffix array and calculates the longest common prefix length for each pair of consecutive suffixes in the sorted suffix array.
Counting Distinct Substrings: After building the suffix array and the LCP array, the next step is to count the number of distinct substrings of each length. This is done by iterating over the LCP array and updating the prefix count array. For each index i in the LCP array, the number of distinct substrings of length greater than LCP[i] and less than or equal to the length of the suffix denoted by suffixArray[i] is incremented. The number of distinct substrings of length greater than the length of the suffix denoted by suffixArray[i] is decremented.
Step-by-Step algorithm:
Implementation of above approach
2 2 2 1
Time Complexity: O(nlogn)
Auxiliary Space: O(n)