Construct a String from another String using Suffix Trie
Last Updated : 23 Nov, 2023
A suffix tree is a data structure based on trie compression that stores all the suffixes of a given string. Apart from detecting patterns in strings, it has a range of applications in fields like bioinformatics, string algorithms, and data compression.
Features of Suffix Trie:
A Suffix Trie, commonly referred to as a suffix tree, is a data structure that resembles a tree and is used to store and look for patterns in strings.
Each route in a suffix trie represents a particular suffix, and it keeps all the suffixes of a given string as pathways in a tree.
We commence with a blank tree and add each suffix of the string to a tree to generate a suffix trie for a text sequence.
The empty string would serve as the root node of the output tree, and then each leaf node will symbolize a suffix of the input string.
A frequent substring that appears in at least two of the suffixes is represented by each internal node.
The ability to quickly find substrings inside a text is one of the key benefits of employing a suffix trie.
By moving down the tree along the route that the pattern specifies, we may search for a pattern in the suffix trie given a pattern.
We shall arrive at a leaf node that represents the suffix that begins with the pattern if the pattern is found in the string.
Explanation: In this solution, we construct the suffix trie for the string "str1". Then, for each substring of "str2", we check if it exists in the suffix trie of "str1". If it exists, we record the starting and ending indices of the substring in "str1" that form the given substring of "str2".
In the example given, the first substring of "str2" is "g". We search for this substring in the suffix trie of "str1" and find it at index 3. Therefore, we record the starting and ending indices of this substring in "str1" as (3, 3). Similarly, we find that the substring "am" in "str2" can be constructed from the suffix trie of "str1" using indices (5, 6) in "str1". Finally, we find that the substring "ing" in "str2" can be constructed from the suffix trie of "str1" using indices (8, 10) in "str1".
Therefore, the output of the program is [(3, 3), (5, 6), (8, 10)], which represents the starting and ending indices of each substring of "str1" that can be used to construct the corresponding substring in "str2".
Explanation: A suffix trie is a data structure that stores all the suffixes of a given string in a tree-like structure. To construct str2 from str1 using a suffix trie, we first build a suffix trie for str1. Then, we search for str2 in the suffix trie by traversing down the tree, following the edges labeled with the characters of str2.
To construct "ana" from "banana", we start at the root of the suffix trie and follow the edges labeled "a", "n", and "a", respectively, until we reach the end of the string. The indices of the characters we traverse are (1, 3), which correspond to the substring "ana" in str1.
Approach: This can be solved with the following idea:
Step 1: Create a Suffix Trie for the Original String
The first step is to construct a trie data structure that represents all the suffixes of the original string. This data structure is called a suffix trie and can be constructed using any standard algorithm.
Step 2: Identify Suffixes Beginning with the Initial Substring
After constructing the suffix trie, the next step is to locate all the suffixes that start with the initial substring of interest. This can be achieved by traversing the trie from the root to the leaf node that corresponds to the initial substring. By following the edges that match the characters of the initial substring, we can identify all the suffixes that begin with it.
Step 3: Determine the Longest Common Prefix (LCP) of the Suffixes
Once we have identified all the suffixes that begin with the initial substring, we need to determine their LCP. To accomplish this, we must identify the lowest common ancestor of the leaf nodes that correspond to the suffixes. The LCA represents the longest common prefix of the suffixes.
Step 4: Add the LCP to the Output String
After determining the LCP of the suffixes, we can add it to the output string.
Step 5: Repeat for Additional Substrings
To find the LCP for every additional substring, we repeat steps 2-4, beginning at the end of the previous substring. We identify all the suffixes that begin with the additional substring, determine their LCP, and add it to the output string.
Below is the code for the above approach:
Output
[(2, 2), (0, 1), (3, 4), (2, 2)]
Time Complexity: O(n2 + m) Auxiliary Space: O(n*26)
Applications of Suffix Trie:
Suffix trie is used to find all occurrences of a pattern in a given text by searching for all substrings of the pattern in the text in pattern matching algorithms.
It is also used to assemble genome sequences from short DNA sequences by matching and aligning the short reads to the reference genome in bioinformatics.
Widely used to check whether a word is spelled correctly by searching for all possible substrings of the input word in spell-checking software.
It is preferably used to identify and optimize frequently used code patterns in compilers and code optimization tools.
Suffix trie is also used in natural language processing applications to properly match and categorize words and phrases based on their morphological and syntactical properties.