Coding Interview: Two-Sum
A blog tutorial to understand the two-sum problem, its variations, and some approaches to solve it efficiently
In a technical interview, one of the most frequent problems is Two-Sum. In this blog tutorial, we are going to see how to approach the Two-Sum problem, its variations as well as its time and space complexity in Big(O) notation.
Let’s get started!
Problem statement
Let’s look at the basic definition of the problem:
Given an array of integers
numsand atarget, returntrueif there is any pair of numbers that sum up totarget, otherwise returnfalse.
For example:
Case 1
Input: nums = [3,1,5,2,9] target = 7
Output: True
Case 2
Input: nums = [3,1,5,2,9] target = 1
Output: False
It is important to mention that the result can be of type boolean (as in the example above) or it may be required to return the index of the pair of numbers that sum up to target .
Before starting with some approaches to address and solve the problem, it is highly recommended to clarify doubts and define the scope of the problem. Some questions you can ask your interviewer that will help you define the scope of the problem would be:
- Is the array sorted?
- Is there any constraint on memory usage?
- Is there any constraint at runtime?
For practical purposes, let’s assume that for now, the array nums will be unsorted and that there are no memory and runtime constraints.
So, how do we solve it? Going step by step, first, we will see how to solve it in the least optimal way and then how we can improve it, let’s go for it!
Approach 1: Brute force
Sometimes, to get closer to the optimal solution, it is suggested to start with a brute force approach, which obviously will not be optimal, but it will provide the basis for understanding the problem and giving way to optimization.
A first approach would be to compare all the possible pairs that could be formed from nums , if we find that pair that sums up to target , we finish the process. The function is shown in code snippet 1.
Although this approach finds the solution, it is not the most optimal solution due to the number of comparisons that have to be made to reach the solution, that is, for each of the n elements in nums , n-1 comparisons are made, which yields a quadratic O(N²) execution time. Memory usage remains constant O(1) because no extra memory is required to perform the comparisons.
One way to optimize this brute force approach would be to use extra memory, which would result in O(N) complexity both at runtime and in memory, let’s see how to do it.
Approach 2: Optimization using Set()
What is intended in this approach is to avoid the number of quadratic order comparisons by making use of extra memory, in this case, by making use of sets or set() in Python.
The use of set() allows us to insert or query in constant time O(1), the only aspect to consider is that set() will require memory in order of O(N) which must be considered if it is a constraint in the scope of the problem.
The idea of this approach is to put all the elements of nums into set() . Then iterate through each element of nums and check if target - num exists in set() , if so, return true . Code snippet 2 shows the function definition.
Inserting n elements to n requires O(N) x O(1) which results in O(N). Looping through each num in nums to verify if the complement target - num exists in set() also requires O(N), resulting in O(N) + O(N) which finally approximates O(N) at runtime and O(N) in memory.
It is important to mention that the approach using set() works only if the value to return is boolean , that is, when no extra information is required to be returned. If we need to return the indices of the numbers that sum up to target , we can use a hashmap instead. Let’s see how to do it.
Approach 3: Optimization using HashMap()
This approach addresses the variation where it is required to return the indices of the pair of numbers that sum up to target instead of just returning true or false .
For example:
Case 1
Input: nums = [3,1,5,2,9] target = 7
Output: [2, 3]
Case 2
Input: nums = [3,1,5,2,9] target = 1
Output: []
For this variation, the hashmap() helps us keep track of the indices. Next, we only need to check if the complement of target - nums[i] exists in hashmap() , if so, return the current and complement indices. The function is shown in code snippet 3.
Inserting and querying keys in a hashmap() requires constant O(1) time, so filling the hashmap() with n elements require constant time O(N), which also results in linear order memory usage O(N). Looping through each element of nums to check if target - nums[i] exists in hashmap() takes O(N) constant time, resulting in an overall O(N) for execution time and memory usage.
While making use of hashmap() helps us verify if target - nums[i] exists as key, we can still make its use a bit more efficient. Note that on line 8 we loop through nums completely to fill hashmap() and then loop back through nums on line 12 to verify the complement, which results in traversing 2 times nums . Let’s see how to make this approach more efficient by looping through nums only 1 time.
Approach 4: Optimization using HashMap() II
The idea of this approach is to loop through nums only once to verify if target - nums[i] exists in hashmap() as key. This will make the execution time more efficient, which although it is still linear, instead of iterating twice over nums , now we would be doing it only once. In code snippet 4 the definition of the function is shown.
Since we are looping through nums in this case, as well, the execution time is linear O(N), and since we use a hashmap() the memory usage is also linear O(N). The optimization occurs by avoiding iterating twice over nums and only doing it once.
Going back to the original problem where we only had to return true or false , how would we approach the problem if nums was already sorted? Let’s see how to do it.
Approach 5: Optimization for sorted array
Let’s say that instead of nums being not sorted, we find that nums is sorted in non-decreasing order.
For example:
Case 1
Input: nums = [1,2,3,5,9] target = 7
Output: True
Case 2
Input: nums = [1,2,3,5,9] target = 1
Output: False
In this case, the problem conditions are suitable so that it is no longer necessary to use extra memory to keep track of numbers already visited. In this case, by implementing the technique of the two pointers, we can determine if there is a pair that sums up to target . Code snippet 5 shows the implementation of the function.
The execution time for this approach is linear O(N) because we only loop through each nums element once in the worst case. Space usage is constant since no extra memory usage is required.
Conclusion
The Two-Sum problem can have different variations and conditions, it is always advised to clarify as much as possible to avoid misunderstandings and to choose the right path. In this blog, we have seen some variations and how to deal with them going from a brute force-based approach to finding a more optimized way.
Leave your comment if you have a specific variation or any other problem that we can break down in the blog.
References
Share This Article
Towards Data Science is a community publication. Submit your insights to reach our global audience and earn through the TDS Author Payment Program.
Write for TDS