Shortest Common Supersequence

Last Updated : 8 Nov, 2025

Given two strings s1 and s2, Find the length of the shortest string that has both s1 and s2 as subsequences.
A subsequence of a string is a sequence that can be derived from the string by deleting zero or more characters without changing the relative order of the remaining characters.

Examples:

Input: s1 = "geek", s2 = "eke"
Output: 5
Explanation: String "geeke" has both string "geek" and "eke" as subsequences.
Input: s1 = "AGGTAB", s2 = "GXTXAYB"
Output: 9
Explanation: String "AGXGTXAYB" has both string "AGGTAB" and "GXTXAYB" as subsequences.

Try It Yourself

👁 redirect icon

Table of Content

[Naive Approach] Using Recursion - O(2^(m+n)) Time and O(m+n) Auxiliary Space

Our goal is to build the shortest string that contains both s1 and s2 as subsequences. A naive thought might be to simply merge all characters of both strings - but that actually gives the longest, not the shortest, supersequence because common characters get counted twice.
The key insight is: If a character appears in both strings and in the same order, we can include it only once in our supersequence.
However, when characters differ, we have choices — and recursion is perfect for exploring such choice-based problems.

We start comparing the characters of both strings from the end. If the last characters match: That character will definitely appear once in the supersequence. We include it and move one step back in both strings.
If the last characters don’t match: We have two options:
Include the last character of s1 and keep s2 as it is.
Include the last character of s2 and keep s1 as it is.
We choose the option that gives the shorter result. When one string becomes empty, it means there’s nothing left to match. So, we simply add all remaining characters of the other string to complete the sequence.

Output

[Better Approach - 1] Using Top-Down DP (Memoization) - O(mn) Time and O(mn) Space

If we look closely at the previous recursive approach, we can notice the same subproblems are solved multiple times. This makes a program inefficient.
To handle this, we use Dynamic Programming with Memoization.

The idea is simple: In the recursive approach, our result depends on two parameters - the current indices m and n. So, we create a 2D array dp of size (m+1) × (n+1), where dp[i][j] stores the length of the shortest common supersequence for the first i characters of s1 and the first j characters of s2.
Before solving any subproblem (i, j), we first check:
If it has been solved before, we simply return the stored value. Otherwise, we compute it recursively, store the result in dp[i][j], and reuse it later whenever needed.

Output

[Better Approach - 2] Using Bottom-Up DP (Tabulation) - O(mn) Time and O(mn) Space

In the recursive and memoized methods, we broke the problem into smaller subproblems using function calls - which also involved extra stack space. In this approach, we will iteratively compute and store results for all smaller subproblems in a 2D DP table and then use these results to build up the final answer.

We create a 2D DP array dp of size (m + 1) x (n + 1), where m and n are the lengths of s1 and s2. First, we define some base cases to start the iteration: dp[i][0] = i and dp[0][j] = j.
This means if one string is empty, the shortest supersequence is simply the other string. Next, we iterate over both strings character by character:
If the characters match, we include this character only once: dp[i][j] = 1 + dp[i-1][j-1]
If the characters don’t match, we have two choices - take a character from s1 or from s2 - and we choose the one that gives the smaller length: dp[i][j] = 1 + min(dp[i-1][j], dp[i][j-1])
After filling the entire table, dp[m][n] will hold the length of the Shortest Common Supersequence (SCS) of s1 and s2.

Output

[Expected Approach - 1] Using Space Optimized DP - O(m*n) Time and O(n) Space

In the previous dynamic programming approach, we derived the relation between states as:
if (s1[i-1] == s2[j-1]) dp[i][j] = 1 + dp[i-1][j-1] that means both characters are the same, so we take it once.
else dp[i][j] = 1 + min(dp[i-1][j], dp[i][j-1]) we take the minimum of both possibilities.
If we observe carefully, to calculate the current cell dp[i][j], we only need values from: the previous row - dp[i-1][j-1] and dp[i-1][j], and the current row -> dp[i][j-1].
This means we don’t need to store the entire 2D table of size (m + 1) x (n + 1).

We can optimize the space by using only two 1D arrays - one for the current row and one for the previous row. So instead of maintaining a complete DP matrix, we keep a prev array to store the previous row. Create a curr array for the current row calculations. After finishing one iteration of i, assign prev = curr for the next iteration.

Output

[Expected Approach - 2] Using LCS Solution - O(m*n) Time and O(n) Space

If we look carefully, we are asked to find the Shortest Common Supersequence (SCS). Now, if both strings had no common characters, then we would need to take all characters from both strings. So, the length of SCS = len(s1) + len(s2). But when the strings share common characters, those common parts should appear only once in the final supersequence to minimize length. That’s where LCS (Longest Common Subsequence) helps us.

We know the relation between SCS and LCS: Length(SCS) = len(s1) + len(s2) − len(LCS)
So instead of directly computing SCS, we just need to find LCS length.
In LCS DP, to compute dp[i][j], we only need:
dp[i-1][j-1] - from previous row
dp[i][j-1] - from current row
dp[i-1][j] - from previous row
Hence, we can use only two 1D arrays - prev and curr.

Output

Comment

Article Tags:

URL: https://www.geeksforgeeks.org/dsa/shortest-common-supersequence/