![]() |
VOOZH | about |
Given a text string, find Longest Repeated Substring in the text. If there are more than one Longest Repeated Substrings, get any one of them.
Longest Repeated Substring in GEEKSFORGEEKS is: GEEKS Longest Repeated Substring in AAAAAAAAAA is: AAAAAAAAA Longest Repeated Substring in ABCDEFG is: No repeated substring Longest Repeated Substring in ABABABA is: ABABA Longest Repeated Substring in ATCGATCGA is: ATCGA Longest Repeated Substring in banana is: ana Longest Repeated Substring in abcpqrabpqpq is: ab (pq is another LRS here)
This problem can be solved by different approaches with varying time and space complexities. Here we will discuss Suffix Tree approach (3rd Suffix Tree Application). Other approaches will be discussed soon.
As a prerequisite, we must know how to build a suffix tree in one or the other way.
Here we will build suffix tree using Ukkonen’s Algorithm, discussed already as below:
Ukkonen’s Suffix Tree Construction – Part 1
Ukkonen’s Suffix Tree Construction – Part 2
Ukkonen’s Suffix Tree Construction – Part 3
Ukkonen’s Suffix Tree Construction – Part 4
Ukkonen’s Suffix Tree Construction – Part 5
Ukkonen’s Suffix Tree Construction – Part 6
Lets look at following figure:
This is suffix tree for string "ABABABA$".
In this string, following substrings are repeated:
A, B, AB, BA, ABA, BAB, ABAB, BABA, ABABA
And Longest Repeated Substring is ABABA.
In a suffix tree, one node can't have more than one outgoing edge starting with same character, and so if there are repeated substring in the text, they will share on same path and that path in suffix tree will go through one or more internal node(s) down the tree (below the point where substring ends on that path).
In above figure, we can see that
All above substrings are repeated.
Substrings ABABAB, ABABABA, BABAB, BABABA have no internal node down the tree (after the point where substring end on the path), and so these are not repeated.
Can you see how to find longest repeated substring ??
We can see in figure that, longest repeated substring will end at the internal node which is farthest from the root (i.e. deepest node in the tree), because length of substring is the path label length from root to that internal node.
So finding longest repeated substring boils down to finding the deepest node in suffix tree and then get the path label from root to that deepest internal node.
Output:
Longest Repeated Substring in GEEKSFORGEEKS$ is: GEEKS Longest Repeated Substring in AAAAAAAAAA$ is: AAAAAAAAA Longest Repeated Substring in ABCDEFG$ is: No repeated substring Longest Repeated Substring in ABABABA$ is: ABABA Longest Repeated Substring in ATCGATCGA$ is: ATCGA Longest Repeated Substring in banana$ is: ana Longest Repeated Substring in abcpqrabpqpq$ is: ab Longest Repeated Substring in pqrpqpqabab$ is: ab
In case of multiple LRS (As we see in last two test cases), this implementation prints the LRS which comes 1st lexicographically.
Ukkonen’s Suffix Tree Construction takes O(N) time and space to build suffix tree for a string of length N and after that finding deepest node will take O(N).
So it is linear in time and space.
Followup questions:
All these problems can be solved in linear time with few changes in above implementation.
We have published following more articles on suffix tree applications: