Saturday, January 8, 2022

How to use the Analytical Functions LEAD and LAG in SQL Server

SQL Analytical or Window Functions LEAD and LAG
The Analytical Functions LEAD() and LAG() used accesses data from a subsequent row (for Lead) and previous row (for Lag) in the same result set without using the self-join, based on the specified Offset number from Current Row.
The LAG/LEAD functions return values from a previous/following row in a Partition, with respect to the Current Row, as specified by the row offset in the function, without the use of a self-join. Offset parameter for Lag and Lead functions cannot be a negative and the default value is 1.

LEAD( ):
The LEAD function access the data from a subsequent row in the same result set without the use of a self-join. LEAD provides access to a row at a given physical offset that follows the current row. The last value in the result is NULL.

Syntax:
LEAD ( scalar_expression ,offset , default )
OVER ( partition_by_clause [order_by_clause] )

Parameters:
Offset: The number of rows forward from the current row from which we have to obtain a value. The default value of offset is 1.
Default: The value to return when scalar_expression at offset is NULL. This is an optional one. When we skip this parameter, then the function returns NULL as default value for a last LEAD value, which is not available at offset.

Examples:
Suppose we have a dataset with sample of orders as follows.
Select 
Order_Id, Order_Date, Prod_Id, Units_Sold
From tbl_Sample


Scenario 1:
Finding the Leading Order_Date values with Offset 1, 2 rows from the Current row, with Order by Prod_Id and No Partition in the Data.

Select 
Order_Id, Order_Date, Prod_Id, Units_Sold,
LEAD(Order_Date, 1,'01-01-9999') Over ( Order by Prod_Id) as Leading_FirstOffset_Date,
LEAD(Order_Date, 2,'01-01-9999') Over ( Order by Prod_Id) as Leading_SecondOffset_Date
From tbl_Sample

Result:


Notes:
The LEAD Order_Date for '2014-01-24' with offset (1) value is : '2014-01-13'  as per the Order by Prod_Id.
The LEAD Order_Date for '2014-01-24' with offset (2) value is : '2014-04-11'  as per the Order by Prod_Id.

Similarly, The LEAD Order_Date for '2014-02-13' with offset (1) value is : '2014-01-11'  as per the Order by Prod_Id.
The LEAD Order_Date for '2014-02-13' with offset (2) value is : NULL  as per the Order by Prod_Id. This NULL is coming from the Source data.

However if you observe the last row, the LEAD Order_Date for '2014-01-14' with offset (1) value is not available. So that the function returns the default value specified '9999-01-01'.
The LEAD Order_Date for '2014-01-14' with offset (2) value also not available. So that the function returns the default value specified '9999'-01-01.

Scenario 2:
Finding the Leading Order_Date values with Offset 1, 2 rows from Current row with Partition By Prod_Id and Order by Order_Date.

Select 
Order_Id, Order_Date, Prod_Id, Units_Sold,
LEAD(Order_Date, 1,'01-01-9999') Over ( Partition By Prod_Id Order by Order_Date)  as Leading_FirstOffset_Date,
LEAD(Order_Date, 2,'01-01-9999') Over ( Partition By Prod_Id Order by Order_Date) as Leading_SecondOffset_Date
From tbl_Sample

Result:


Notes:
The LEAD Order_Date for NULL with offset (1) value is : '2014-01-24' in the first Partition ( Product_Id = 1234 )
The LEAD Order_Date for NULL with offset (2) value is : '2014-02-13' in the first Partition ( Product_Id = 1234 )

Similarly, The LEAD Order_Date for 2014-04-11' with offset (1) value is not available in first Partition ( Product_Id = 1234 ) as the Partion is ended there. So that the the Default value is return as "9999-01-01'
The LEAD Order_Date for 2014-04-11' with offset (2) value is not available in first Partition ( Product_Id = 1234 ) as the Partion is ended there. So that the the Default value is return as "9999-01-01'

The same LEAD() Method will apply for each Partition of the Dataset.

LAG( ):
The LAG function is opposite to the LEAD function. The LAG function access the data from a previous row in the same result set without the use of a self-join. The first value in the result is NULL.

Syntax:
LAG ( scalar_expression ,offset , default )
OVER ( partition_by_clause [order_by_clause] )

Parameters:
Offset: The number of rows backward from the current row from which we have to obtain a value. The default value of offset is 1.
Default: The value to return when scalar_expression at offset is NULL. This is an optional one. When we skip this parameter, then the function returns NULL as default value for the first LAG value, which is not available at offset.

Examples:
Suppose we have a dataset with sample of orders as follows.
Select 
Order_Id, Order_Date, Prod_Id, Units_Sold
From tbl_Sample


Scenario 1:
Finding the Lagging Order_Date values with Offset 1, 2 rows from the Current row, with Order by Prod_Id and No Partition in the Data.

Select 
Order_Id, Order_Date, Prod_Id, Units_Sold,
LAG(Order_Date, 1,'01-01-9999') Over ( Order by Prod_Id) as Lagging_FirstOffset_Date,
LAG(Order_Date, 2,'01-01-9999') Over ( Order by Prod_Id) as Lagging_SecondOffset_Date
From tbl_Sample

Result:

Notes:
The LAG Order_Date for '2014-01-24' with offset (1) value of Previous Row from Current row is not available. Hence the function returns the default value specified '9999-01-01'.
The LAG Order_Date for '2014-01-24' with offset (2) value of Previous Row from Current row is not available. Hence the function returns the default value specified '9999-01-01'.

Similarly, The LAG Order_Date for '2014-05-26' with offset (1) value is : NULL as per the Order by Prod_Id. Hence it returns the Null. This NULL is coming from the Source data.
The LAG Order_Date for '2014-05-26' with offset (2) value is : 2014-04-11' as per the Order by Prod_Id.

Scenario 2:
Finding the Lagging Order_Date values with Offset 1, 2 rows from Current row with Partition By Prod_Id and Order by Order_Date.

Select 
Order_Id, Order_Date, Prod_Id, Units_Sold,
LAG(Order_Date, 1,'01-01-9999') Over ( Partition By Prod_Id Order by Order_Date)  as Lagging_FirstOffset_Date,
LAG(Order_Date, 2,'01-01-9999') Over ( Partition By Prod_Id Order by Order_Date) as Lagging_SecondOffset_Date
From tbl_Sample

Result:


Notes:
The LAG Order_Date for NULL with offset (1) value of Previous Row from Current row in the first Partition ( Product_Id = 1234 ) is not available. Hence the function returns the default value specified '9999-01-01'. 
The LAG Order_Date for NULL with offset (2) value of Previous Row from Current row in the first Partition ( Product_Id = 1234 ) is not available. Hence the function returns the default value specified '9999-01-01'. 

Similarly, The LAG Order_Date for 2014-01-05' with offset (1) value is not available in second Partition ( Product_Id = 1235 ) as it is a part of the first Partition. Hence the the Default value is return as "9999-01-01'
The LAG Order_Date for 2014-04-11' with offset (2) value is not available in second Partition ( Product_Id = 1235 ) as it is a part of the first Partition. Hence the the Default value is return as "9999-01-01'

The same LAG() Method will apply for each Partition of the Dataset.

--------------------------------------------------------------------------------------------------------
Thanks, TAMATAM ; Business Intelligence & Analytics Professional
--------------------------------------------------------------------------------------------------------

No comments:

Post a Comment

Hi User, Thank You for visiting My Blog. Please post your genuine Feedback or comments only related to this Blog Posts. Please do not post any Spam comments or Advertising kind of comments which will be Ignored.