![]() |
VOOZH | about |
Given a string S which represents DNA sequence, the task is to find all the 10-letter long substring that are repeated more than once. Returning the sequence can be done in any order.
DNA sequence is string which consists of the 4 characters A, C, G and T.
Input: S = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT"
Output: ["AAAAACCCCC", "CCCCCAAAAA"]
Explanation: Both the substrings "AAAAACCCCC" and "CCCCCAAAAA" occur more than once in the string s.Input: S = "AAAAAAAAAAAAA"
Output: ["AAAAAAAAAA"]
Explanation: Substring "AAAAAAAAAA" occurs more than once in the substring.
Approach: To solve the problem, follow the below idea:
The problem can be solved using two sets, say seen and repeated. The seen set stores the strings which occurs only once. When we encounter a substring which is already present in seen, then we push the substring to the repeated set. After iterating over all the substrings, print all the strings in the repeated set.
Step-by-step algorithm:
Below is the implementation of the algorithm:
['CCCCCAAAAA', 'AAAAACCCCC']
Time Complexity: O(10 * N), where N is the length of string.
Auxiliary Space: O(10 * N)