![]() |
VOOZH | about |
Given a text string and a pattern string, find all occurrences of the pattern in string. Few pattern searching algorithms (KMP, Rabin-Karp, Naive Algorithm, Finite Automata) are already discussed, which can be used for this check. Here we will discuss the suffix tree based algorithm. In the 1st Suffix Tree Application (Substring Check), we saw how to check whether a given pattern is substring of a text or not. It is advised to go through Substring Check 1st.
In this article, we will go a bit further on same problem. If a pattern is substring of a text, then we will find all the positions on pattern in the text. As a prerequisite, we must know how to build a suffix tree in one or the other way.
Here we will build suffix tree using Ukkonen’s Algorithm, discussed already as below:
Ukkonen’s Suffix Tree Construction – Part 1
Ukkonen’s Suffix Tree Construction – Part 2
Ukkonen’s Suffix Tree Construction – Part 3
Ukkonen’s Suffix Tree Construction – Part 4
Ukkonen’s Suffix Tree Construction – Part 5
Ukkonen’s Suffix Tree Construction – Part 6
Lets look at following figure: 👁 Suffix Tree Application
With above explanation, we should be able to see following:
Can you see how to find all the occurrences of a pattern in a string ?
Output:
Text: GEEKSFORGEEKS, Pattern to search: GEEKS Found at position: 8 Found at position: 0 substring count: 2 Pattern <GEEKS> is a Substring Text: GEEKSFORGEEKS, Pattern to search: GEEK1 Pattern <GEEK1> is NOT a Substring Text: GEEKSFORGEEKS, Pattern to search: FOR substring count: 1 and position: 5 Pattern <FOR> is a Substring Text: AABAACAADAABAAABAA, Pattern to search: AABA Found at position: 13 Found at position: 9 Found at position: 0 substring count: 3 Pattern <AABA> is a Substring Text: AABAACAADAABAAABAA, Pattern to search: AA Found at position: 16 Found at position: 12 Found at position: 13 Found at position: 9 Found at position: 0 Found at position: 3 Found at position: 6 substring count: 7 Pattern <AA> is a Substring Text: AABAACAADAABAAABAA, Pattern to search: AAE Pattern <AAE> is NOT a Substring Text: AAAAAAAAA, Pattern to search: AAAA Found at position: 5 Found at position: 4 Found at position: 3 Found at position: 2 Found at position: 1 Found at position: 0 substring count: 6 Pattern <AAAA> is a Substring Text: AAAAAAAAA, Pattern to search: AA Found at position: 7 Found at position: 6 Found at position: 5 Found at position: 4 Found at position: 3 Found at position: 2 Found at position: 1 Found at position: 0 substring count: 8 Pattern <AA> is a Substring Text: AAAAAAAAA, Pattern to search: A Found at position: 8 Found at position: 7 Found at position: 6 Found at position: 5 Found at position: 4 Found at position: 3 Found at position: 2 Found at position: 1 Found at position: 0 substring count: 9 Pattern <A> is a Substring Text: AAAAAAAAA, Pattern to search: AB Pattern <AB> is NOT a Substring
Ukkonen’s Suffix Tree Construction takes O(N) time and space to build suffix tree for a string of length N and after that, traversal for substring check takes O(M) for a pattern of length M and then if there are Z occurrences of the pattern, it will take O(Z) to find indices of all those Z occurrences. Overall pattern complexity is linear: O(M + Z).
A bit more detailed analysis
How many internal nodes will there in a suffix tree of string of length N ??
Answer: N-1 (Why ??)
There will be N suffixes in a string of length N. Each suffix will have one leaf. So a suffix tree of string of length N will have N leaves. As each internal node has at least 2 children, an N-leaf suffix tree has at most N-1 internal nodes. If a pattern occurs Z times in string, means it will be part of Z suffixes, so there will be Z leaves below in point (internal node and in between edge) where pattern match ends in tree and so subtree with Z leaves below that point will have Z-1 internal nodes. A tree with Z leaves can be traversed in O(Z) time. Overall pattern complexity is linear: O(M + Z). For a given pattern, Z (the number of occurrences) can be atmost N. So worst case complexity can be: O(M + N) if Z is close/equal to N (A tree traversal with N nodes take O(N) time).
Followup questions:
We have published following more articles on suffix tree applications: