A Tutorial using window functions to simplify your SQL solutions
If you’ve been practicing your SQL religiously, like I suggested in Top Skills to Ace Every SQL Interview Question, then you’ve probably run into problems that use window functions. Or you’ve run into them in the discussions or solutions page on Leetcode.
Window functions can be complicated but they make many problems a lot easier to solve. Once you learn the most used window functions, you will be implementing them into your solutions like a breeze.
Today I’m going to talk about two functions that are helpful when looking to simplify your solutions— LEAD and LAG. While these functions are used in the same way, they give you opposite results. One will find the previous record and the other will find the next record. But which does which?
What they do
LEAD and LAG will display the previous value or following value compared to the current row you are looking at. Like other window functions you must specify the column in which you’re running the function over and, optionally, the column you are partitioning by and the column you are ordering by. However, it’s most likely you will want to use ORDER BY when using these functions.
For example, let’s say you have the finishing orders of runners in a marathon race. You want to find the people who finished before and after you. You can use these functions to find the names of those people by running this over runner name and ordering by finishing place.
How to use them
Both functions use an OVER() clause along with PARTITION BY and ORDER BY. The PARTITION BY part of the functions is optional but the ORDER BY is almost always necessary. The only case you wouldn’t want to use it is when you’re looking for rows before/after your target row in the table as is.
SELECT runner_name, place, LEAD(runner_name) OVER(ORDER BY place ASC) AS runner_after FROM race_results
Something important to note here is the NULL value in the last row of the table. Since there is no runner after Ellen, the LEAD function would return NULL.
ORDER BY specifies the column you wish to order the rows in the table by. In this case we are ordering by place, starting with first place and ending with last place. Whether you choose ASC or DESC order here is also important depending on the result you are looking for. Remember ASC means the numbers get larger as you scroll down through a table’s rows while DESC means the numbers would start at the highest and become lower.
PARTITION BY acts as grouping logic. It will only look at the rows that have the same values in the column specified. For example, if we partition the race results by gender, LEAD and LAG will only give us the runners before and after that are the same gender as the row in question.
SELECT runner_name, place, gender, LEAD(runner_name) OVER(PARTITION BY gender ORDER BY place ASC) AS same_gender_runner_after
Here we can see that the same_gender_runner_after column gives us the name of the next runner in the same gender category as the target row. It can also be noted that John and Ellen have NULL values in this column since they have no runners of their gender after them.
Difference between LEAD and LAG
LEAD will give you the row AFTER the row you are finding a value for. LAG will give you the row BEFORE the row you are finding a value for.
Think of these terms in relation to your target row. Your target row is in the lead of the row returned when using the LEAD function. Your target row is lagging behind the row returned when using the LAG function.
Keep in mind that it may not return the exact row before or after the target row if you are using a PARTITION BY statement. It will return the row before or after that has the same value in the column specified by this statement.
Let’s compare the outcome of both LEAD and LAG without partitioning first.
When we don’t use PARTITION BY we are getting the true rows that are before/after the target row, as long as they are being ordered by place in this case. Also, the last row is NULL when using the LEAD function but the first row is NULL when using the LAG function. Looking at the LAG column, Madison has a NULL value here because nobody finished before her.
Now let’s compare the two functions when we partition by gender.
Here we can see LEAD and LAG give us the rows after/before that are within the same gender, skipping over the "true" next rows if the gender does not match the target row’s. Again, we can see the last two rows of LEAD being NULL and the first two rows of LAG being NULL. Here it is the first/last two rows due to the partitioning by gender.
If you are studying for a SQL interview, be sure to check out my article Top Skills to Ace Every SQL Interview Question as well as How to Use SQL RANK and DENSE_RANK Functions for a closer look at SQL’s ranking functions.
Happy querying!
Share This Article
Towards Data Science is a community publication. Submit your insights to reach our global audience and earn through the TDS Author Payment Program.
Write for TDS