Sunday, January 9, 2022

How to use the Statistical Analytical Functions PERCENTILE_DISC, PERCENTILE_CONT in SQL Server

The Statistical Analytical Functions PERCENTILE_DISC, PERCENTILE_CONT in SQL Server
PERCENTILE_DISC ( ) :
The PERCENTILE_DISC is one of the Statistical Analytical Functions available in SQL Server.
PERCENTILE_DISC calculates the percentile based on a discrete distribution of the column values. The result is equal to a specific value of the column used in the Order by clause.

This function computes a specific percentile for sorted values in an entire rowset or within the rowset's distinct partitions in SQL Server. 
For a given percentile value P, PERCENTILE_DISC sorts the expression values in the ORDER BY clause. It then returns the value with the smallest Cumulative Distribution(CUME_DIST) value given (with respect to the same sort specification) that is greater than or equal to P. 
For example, PERCENTILE_DISC (0.5) will compute the 50th percentile (that is, the median) of an expression.

Syntax:
PERCENTILE_DISC ( numeric_literal ) WITHIN GROUP ( ORDER BY order_by_expression [ ASC | DESC ]OVER ( [ <partition_by_clause> ] )

Arguments :
literal
The percentile to compute. The value must range between 0.0 and 1.0.

WITHIN GROUP ( ORDER BY order_by_expression [ ASC | DESC)**
Specifies a list of values to sort and compute the percentile over. The default sort order is ascending. Only one order_by_expression is allowed.  The list of values can be of any of the data types that are valid for the sort operation. 
However, it is advisable to use the numerical column in Order by clause.

OVER ( <partition_by_clause>)**
Divides the FROM clause's result set into partitions. The percentile function is applied to these partitions.

Return Type : The return type or result is determined by the order_by_expression type.

Examples :
Suppose we have a dataset with sample of orders as follows.
Select
Order_Id, Order_Date, Prod_Id, Units_Sold
From tbl_Sample



Scenario 1:
Computing various percentiles (0, 0.25, 0.5, 0.75, 1from the above dataset with Order by Units_Sold and without Partition.

Select 
Order_Id, Order_Date, Prod_Id, Units_Sold,
PERCENTILE_DISC(0.0) WITHIN GROUP (Order By Units_Sold)
OVER () as Percentile_0,
PERCENTILE_DISC(0.25) WITHIN GROUP (Order By Units_Sold)
OVER ()  as Percentile_25,
PERCENTILE_DISC(0.50) WITHIN GROUP (Order By Units_Sold)
OVER ()  as Percentile_50,
PERCENTILE_DISC(0.75) WITHIN GROUP (Order By Units_Sold)
OVER ()  as Percentile_75,
PERCENTILE_DISC(1) WITHIN GROUP (Order By Units_Sold)
OVER ()  as Percentile_1
From tbl_Sample

#Result:
The above query will return the percentile values based on the enter column values from the column Units_Sold of the Dataset.


Scenario 2:
Computing various percentiles (0, 0.25, 0.5, 0.75, 1) from the above dataset with Order by Units_Sold and Partition by Prod_Id.

Select 
Order_Id, Order_Date, Prod_Id, Units_Sold,
PERCENTILE_DISC(0.0) WITHIN GROUP (Order By Units_Sold)
OVER ( Partition By Prod_Id )  as Percentile_0,
PERCENTILE_DISC(0.25) WITHIN GROUP (Order By Units_Sold)
OVER ( Partition By Prod_Id )   as Percentile_25,
PERCENTILE_DISC(0.50) WITHIN GROUP (Order By Units_Sold)
OVER Partition By Prod_Id )  as Percentile_50,
PERCENTILE_DISC(0.75) WITHIN GROUP (Order By Units_Sold)
OVERPartition By Prod_Id )  as Percentile_75,
PERCENTILE_DISC(1) WITHIN GROUP (Order By Units_Sold)
OVER ( Partition By Prod_Id )  as Percentile_1
From tbl_Sample

#Result:
The above query will return the percentile values based on the Partition Window of the column values from a column Units_Sold of the Dataset.



PERCENTILE_CONT ( ) :
The PERCENTILE_CONT is one of the Statistical Analytical Functions available in the SQL Server. PERCENTILE_CONT calculates the percentile based on a continuous distribution of a column values. 
The result is interpolated(introduces a new value) and might not be equal to any of the specific values in the column used in the Order by clause.

Syntax:
PERCENTILE_CONT ( numeric_literal ) WITHIN GROUP ( ORDER BY order_by_expression [ ASC | DESC ] )  OVER ( [ <partition_by_clause> ] )

Arguments :
literal
The percentile to compute. The value must range between 0.0 and 1.0.

WITHIN GROUP ( ORDER BY order_by_expression [ ASC | DESC)**
Specifies a list of values to sort and compute the percentile over. The default sort order is ascending. Only one order_by_expression is allowed. The expression must evaluate to an exact or approximate numeric type, with no other data types allowed. 
Exact numeric types are int, bigint, smallint, tinyint, numeric, bit, decimal, smallmoney, and money. Approximate numeric types are float and real.

OVER ( <partition_by_clause>)**
Divides the FROM clause's result set into partitions. The percentile function is applied to these partitions.

Return Type : The return type or result is Float(53).

Scenario 1:
Computing various percentiles (0, 0.25, 0.5, 0.75, 1from the above dataset with Order by Units_Sold and without Partition.

Select 
Order_Id, Order_Date, Prod_Id, Units_Sold,
PERCENTILE_CONT(0.0) WITHIN GROUP (Order By Units_Sold)
OVER () as Percentile_0,
PERCENTILE_CONT(0.25) WITHIN GROUP (Order By Units_Sold)
OVER ()  as Percentile_25,
PERCENTILE_CONT(0.50) WITHIN GROUP (Order By Units_Sold)
OVER ()  as Percentile_50,
PERCENTILE_CONT(0.75) WITHIN GROUP (Order By Units_Sold)
OVER ()  as Percentile_75,
PERCENTILE_CONT(1) WITHIN GROUP (Order By Units_Sold)
OVER ()  as Percentile_1
From tbl_Sample

#Result:
The above query will return the percentile values based on the enter column values from the column Units_Sold of the Dataset.
If we observe the values of Percentile_Cont(0.25) = 14.75 and Percentile_Cont(0.75) = 33, which not exist in Units_Sold column but they are return by Function. 


Scenario 2:
Computing various percentiles (0, 0.25, 0.5, 0.75, 1) from the above dataset with Order by Units_Sold and Partition by Prod_Id.
Select 
Order_Id, Order_Date, Prod_Id, Units_Sold,
PERCENTILE_CONT(0.0) WITHIN GROUP (Order By Units_Sold)
OVER ( Partition By Prod_Id )  as Percentile_0,
PERCENTILE_CONT(0.25) WITHIN GROUP (Order By Units_Sold)
OVER ( Partition By Prod_Id )   as Percentile_25,
PERCENTILE_CONT(0.50) WITHIN GROUP (Order By Units_Sold)
OVER Partition By Prod_Id )  as Percentile_50,
PERCENTILE_CONT(0.75) WITHIN GROUP (Order By Units_Sold)
OVER ( Partition By Prod_Id )  as Percentile_75,
PERCENTILE_CONT(1) WITHIN GROUP (Order By Units_Sold)
OVER ( Partition By Prod_Id )  as Percentile_1
From tbl_Sample

#Result:
The above query will return the percentile values based on the Partition Window of the column values from a column Units_Sold of the Dataset.


--------------------------------------------------------------------------------------------------------
Thanks, TAMATAM ; Business Intelligence & Analytics Professional
--------------------------------------------------------------------------------------------------------

No comments:

Post a Comment

Hi User, Thank You for visiting My Blog. Please post your genuine Feedback or comments only related to this Blog Posts. Please do not post any Spam comments or Advertising kind of comments which will be Ignored.