Top Skills to Ace Every SQL Interview Question
Understanding and practicing joins, aggregate functions, and subqueries are the key to your success in a coding interview.
There is almost always a SQL portion to any interview in a data-oriented position. Whether you’re interviewing for an analyst, engineering, or data science role, SQL will always creep its way into the coding portion of the process.
Once you nail a few key concepts you can pretty much answer any question, with lots of practice that is. Start with the basics for each of these concepts and work your way up to more difficult problems.
Here are the 3 key concepts you must know inside and out to succeed in your SQL interview:
Joins
You won’t find a problem that DOESN’T involve a join. Know the difference between a regular join, inner join, and left join. When you can distinguish between these and use them in the right scenarios, you will be golden.
Inner join is used when you only want the values present in both tables. It will compare the values in whatever columns you are joining on so that only the rows with matching values in these columns are present in the resulting table.
EX: Let’s say you want to find scary movies that are currently playing at the theater. The ScaryMovies table contains all of the scary movies ever in the theater. The NowPlaying table contains all of the movies currently playing in the theater.
SELECT
ScaryMovies.name
FROM NowPlaying
INNER JOIN ScaryMovies
ON NowPlaying.movie_id = ScaryMovies.movie_id
Joining these tables using an inner join will select ONLY the movies that are currently in theater and ONLY the movies in the scary movies table (or the overlap between the two tables).
Left join is used when you want the values present in the first table you are joining as well as the matching values in the second table. I like to ask myself, what data do I actually want in the end table? If I’m looking for all of the data in the table plus a column from another table, then I usually use a left join. Your original data stays pretty similar to the first table, plus a few extra columns.
Keep in mind that when you use a left join, if there are no matches in the first table and second tables, the columns you’re pulling from the first table will simply be null.
EX: Let’s use the NowPlaying table again. This table contains only a movieId to distinguish what movie it is. All of the other information like name, type, and rating about each movie is in a MovieDetails table. We want to find the movie details about all of the movies currently in theater.
SELECT
NowPlaying.movie_id,
Movies.name,
Movies.type,
Movies.rating
FROM NowPlaying
LEFT JOIN Movies
ON NowPlaying.movie_id = Movies.movie_id
Here, we only want the movie details about the movies in the NowPlaying table and some further details that we can utilize from the Movies table. Using a left join will result in all movieIds from the NowPlaying table being in the resulting table as well as the name, type, and rating if it exists in the Movies table. If the movie_id from the left doesn’t exist in the right then the name, type, and rating columns will be null for that row.
Lastly is join, the most basic of them all. You will also see this referred to as full outer join, but it’s simplest to write just join in your code. This is used when you want all of the values in both of the tables. This is particularly useful when looking for consecutive values in an id or date column.
EX: Let’s say you want to find all of the movies that played 2 days in a row.
SELECT
DISTINCT a.name
FROM Movies a
JOIN Movies b
ON a.play_date + 1 = b.play_date
Here I am joining the same tables but joining them based off the play_date so that only the movies that played 2 days in a row will be populated in the table. Movies that weren’t played two days in a row won’t join.
Aggregate Functions
Just like joins, you will find that you need to use some type of aggregate function in nearly every interview problem. These are pretty self-explanatory but the key here is to know how to use these properly with the GROUP BY function.
MIN() and MAX() are exactly what they sound like. They will give you either the minimum or maximum value of the specified column within a group (or the entire table). If you don’t specify a GROUP BY function then these functions are guaranteed to only return one value.
SELECT
Name,
MAX(Revenue) AS Max_revenue
FROM MOVIES
GROUP BY Genre
This will return the movie with the highest revenue in each movie genre, showing its name and revenue amount.
COUNT() counts the number of rows present in whatever condition you specify within your function. Remember, this is different than SUM() which actually adds the values in the column that you specify.
SELECT
Name,
SUM(Revenue) AS Total_revenue,
COUNT(*) AS Times_played
FROM Movies
GROUP BY Name
Let’s look at the difference between SUM() and COUNT() using this example code. Here, the SUM(Revenue) function will add together all of the revenue earned by each movie (notice the GROUP BY Name function). In contrast, the COUNT(*) will simply count the number of rows for each movie name. You could also do COUNT(Revenue) and get the same thing but it makes more sense to count all of the rows instead of revenue.
AVG() will make your life a whole lot easier when calculating the average column of a certain group. It essentially does the same thing as SUM(Revenue)/COUNT(Revenue), since the average is just the sum divided by the count.
SELECT
date_played,
AVG(revenue)
FROM Movies
GROUP BY date_played
This will give you the average amount of revenue the theater made between all of its movies for each date that it was open and playing movies.
Subqueries
This one probably requires the most practice of them all. A lot of times companies will ask you to solve a problem that can easily be solved with multiple tables, with just one subquery.
A good way to practice this is by first solving a problem the easiest way possible, maybe with multiple tables. Once you get the correct answer, try to refactor your code so that you can solve it using one subquery rather than multiple tables.
Let’s see exactly what this looks like:
Original:
WITH MarchMovies AS (
SELECT
movie,
genre,
date_played
FROM Movies
WHERE date_played BETWEEN '03-01-2018' AND '03-31-2018'
),
MayMovies AS (
SELECT
movie,
genre,
date_played
FROM Movies
WHERE date_played BETWEEN '05-01-2018' AND '05-31-2018'
)
SELECT
MayMovies.movie
FROM MayMovies
INNER JOIN MarchMovies
ON MayMovies.movie = MarchMovies.movie
Here we find all of the movies that played in March and all of the movies that played in May and join them on movie title using an INNER join to get only the movies that were played both months.
Using one subquery:
SELECT
movie
FROM Movies
WHERE movie IN (SELECT movie FROM Movies WHERE date_played BETWEEN '03-01-2018' AND '03-31-2018')
AND date_played BETWEEN '05-01-2018' AND '05-31-2018'
Here we find all of the movie titles played in March using a WHERE and IN operator. If that movie title is present in the temporary table created for March movies AND played in May, it will be in the output table.
Conclusion
Once you understand these three basic concepts it just comes down to practice. Spend at least 15 minutes each day trying new problems. If you can’t figure one out, look at the solution and try it again the next day. LeetCode and HackerRank offer tons of problems that utilize all of these skills. You have no excuses not to ace your SQL interview!
Practice, practice, practice!
Share This Article
Towards Data Science is a community publication. Submit your insights to reach our global audience and earn through the TDS Author Payment Program.
Write for TDS