Find Repeated DNA Sequences

Last Updated : 10 Jun, 2024

Given a string S which represents DNA sequence, the task is to find all the 10-letter long substring that are repeated more than once. Returning the sequence can be done in any order.

DNA sequence is string which consists of the 4 characters A, C, G and T.

Examples:

Input: S = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT"
Output: ["AAAAACCCCC", "CCCCCAAAAA"]
Explanation: Both the substrings "AAAAACCCCC" and "CCCCCAAAAA" occur more than once in the string s.

Input: S = "AAAAAAAAAAAAA"
Output: ["AAAAAAAAAA"]
Explanation: Substring "AAAAAAAAAA" occurs more than once in the substring.

Approach: To solve the problem, follow the below idea:

The problem can be solved using two sets, say seen and repeated. The seen set stores the strings which occurs only once. When we encounter a substring which is already present in seen, then we push the substring to the repeated set. After iterating over all the substrings, print all the strings in the repeated set.

Step-by-step algorithm:

Starting from the first substring, iterate over all the substrings of length 10.
Maintain two sets, say seen and repeated.
For any substring str, check if str is present in seen.
If str is present in seen, then insert str to repeated.
Else if str is not present in seen, then insert str to seen.
After iterating over all the substrings, print all the strings in repeated.

Below is the implementation of the algorithm:

Output

['CCCCCAAAAA', 'AAAAACCCCC']

Time Complexity: O(10 * N), where N is the length of string.
Auxiliary Space: O(10 * N)

Comment

Article Tags:

Strings

DSA

Amazon

URL: https://www.geeksforgeeks.org/dsa/find-repeated-dna-sequences/

⇱ Find Repeated DNA Sequences - GeeksforGeeks

Find Repeated DNA Sequences

Examples:

Explore