Tuesday, January 29, 2019

How to do Customers Retention Analysis with SQL In SQL Server

How to Analyze and Identify the returning Loyal Customers with SQL in SQL Server
Scenario:
Suppose there is Retail Store, where the Business Owner wants to Identify the how many Customers are frequently buying from the Store each Month, till the Year End.
The Customers who are buying from the Store in the initial First 3 Months will be Flagged as "Base Customers".
Any Customers who started visiting the Store after Initial 3 Months ( between M4 to M12) will be flagged as "New Customers".
The Customers who buys from the Store frequently( Visits like M1, M2, M3, M4..M11, M12), which means they are continuously buying from the Store each Month will be Flagged as the "Retained Customers".
The Customers who buys from the Store less frequently( Visits like M1, M3, M4, M6..M10, M12 ), which means they are not continuously visiting the Store each Month will be Flagged as the "Lagged Customers".
The Customers who visits the Store continuously(or not) but stopped visiting the before M11 , which means they are not continuing till the Year End will be Flagged as "Lost Customers".

Please note that, the "Base Customers" can be a "Retained/Lagged/Lost" Customer at any point of time based on their Visits to the Store.
At the end of the Fiscal Year, the Business wants to Award the "Loyal Customers" who will be Retained with their Store as "Retained Customers", with Most frequent Visits.

We can fulfill the above Scenario of the Business Owner, by Implementing the Logics using SQL Server, as explained in below example

Example :
Use AnalyticsDB
GO
Suppose we have Table of data with Customers and their Order details for the Fiscal Year 2018. I have just shown a sample of 15 records out of 366 rows.
Select Distinct * From  [dbo].[Customer_Orders]

Defining the Tables required for Customer Retention Analysis
Declare @@Base_Customers As Table (Cust_Id  Int)
Declare @@Cust_Visit_Summary As Table (FY Int, Mth Int, Cust_Visit_Cnt Int, NetSales Money)
Declare @@Cust_Visit_Log As Table ( Cust_Id Int, Mth_Visit Int, NetSales Money)
Declare @@Cust_Visit_Lapse_Log As Table ( Cust_Id Int, Mth_Visit Int, NetSales Money, Leading_Mth_Visit Int,Mth_Visit_Lagging Int)

Updating the @@Base_Customers Table with Customer Ids who Buys from the Store in the Initial 3 Months
Insert Into @@Base_Customers (Cust_Id)
(   Select Distinct Cust_Id From [dbo].[Customer_Orders]
     Where Month([Order_Date]) In (1,2,3)
)

Select * From  @@Base_Customers
There were 35 Customers have been Identified as Base Customers from the Store, who Buys in the initial 3 Months. But I have shown only 15 Customers here.

The following query shows the All(Inclusive of Base) Customers Visit(Buy) Detail By Month.
I have shown only 15 records out of 266 rows:
Insert Into @@Cust_Visit_Log (Cust_Id, Mth_Visit, NetSales)
( Select Distinct Cust_Id,Month(Order_Date) Visit_Month, Sum(NetSales) NetSales
  From [dbo].[Customer_Orders]
  Group By Cust_Id,Month(Order_Date)
)

Select * From @@Cust_Visit_Log

Next the following query shows the All(Inclusive of Base) Customers Visit(Buy) Summary by the Month:
Insert Into @@Cust_Visit_Summary
Select  Year([Order_Date]) FY,
    Month([Order_Date]) Mth, Count(Distinct Cust_Id) Cust_Visit_Cnt,
    Sum([NetSales]) NetSales
From [dbo].[Customer_Orders]
Group By Year([Order_Date]) , Month([Order_Date])

Select * From @@Cust_Visit_Summary
Next the following query shows the All(Inclusive of Base) Customers Visit(Buy) by the Month , and the Leading and Lagging Months details :
Insert Into @@Cust_Visit_Lapse_Log
Select *, 
Lead(Mth_Visit,1) Over (Partition BY Cust_Id ORDER BY Cust_Id,Mth_Visit) Leading_Mth_Visit,
Lag(Mth_Visit,1) Over (Partition BY Cust_Id ORDER BY Cust_Id,Mth_Visit)
Mth_Visit_Lagging
From @@Cust_Visit_Log

Select * From @@Cust_Visit_Lapse_Log

Notes :
Leading_Mth_Visit :
In the first record, the Visit Month of the Cust_Id "1111" is "1", then the "Leading_Mth_Visit" will show the Next Visit Month of that Customer which is "3" that Leading by "1".
In the fourth record, the Leading_Mth_Visit is shown as "NULL" as the Mth_Visit "12" is not Leading any further Month values for the same customer.

Mth_Visit_Lagging :
In the first record, the Visit Month of the Cust_Id "1111" is "1", then the "Mth_Visit_Lagging" will shown as "NULL", since the Visit Month "1" is not Lagging behind any Month values for that Customer.
In the fourth record, the "Mth_Visit_Lagging" is shown as "4", since the Mth_Visit  "12" is the Lagging Month behind the "4" for the same Customer.


The above Section is very Important in our Customer Retention Analysis. The Lead and Lag are the two Analytical Functions provided by the SQL Server.

Next the following query shows the details of Gap/Lapse(in Months) in the Customer visit to the Store and the Categorization of Customer.I have shown here only the 35 records out of 266 rows:
IF OBJECT_ID('[tempdb].[dbo].[#CustRetentionAnalysis]', 'U') IS NOT NULL
   Begin
       DROP TABLE [dbo].[#CustRetentionAnalysis]
   End

Select * Into [dbo].[#CustRetentionAnalysis]
From (
Select *, Case When [Mth_Visit_Lagging] IS NULL Then ([Leading_Mth_Visit]-[Mth_Visit])
Else ([Mth_Visit]-[Mth_Visit_Lagging])
End As [Lapse_in_Visit],
(Case When  [Mth_Visit] In (1,2,3) Then 'Base_Cust'
When  [Mth_Visit] Not In (1,2,3) And (Mth_Visit-[Mth_Visit_Lagging]) IS NULL 
Then 'New_Cust'
When [Mth_Visit] <11 And [Leading_Mth_Visit] IS NULL Then 'Cust_Lost'
When ([Mth_Visit]-[Mth_Visit_Lagging] =1) And
([Leading_Mth_Visit]-[Mth_Visit] =1 OR [Mth_Visit]-[Mth_Visit_Lagging] =1)
Then 'Retained_Cust'
When ([Leading_Mth_Visit]-[Mth_Visit] >1 OR [Mth_Visit]-[Mth_Visit_Lagging] >1)
Then 'Lagged_Cust'
End ) As Cust_Type
From @@Cust_Visit_Lapse_Log
) S1 Order by 1, 2

Select * From [dbo].[#CustRetentionAnalysis]

Next the following query shows the details of Retained Customers, who frequently Visited the Store:
IF OBJECT_ID('[tempdb].[dbo].[#LoyalCustomers]', 'U') IS NOT NULL
   Begin
       DROP TABLE [dbo].[#LoyalCustomers]
   End

Select S1.* Into  [dbo].[#LoyalCustomers]
From
(
Select TOP(1000000000) [Cust_Id], [Cust_Type] , Sum([NetSales]) NetSales,
Count(Cust_Type) [Cust_Type_Freq]
From [dbo].[#CustRetentionAnalysis]
Where [Cust_Type]= 'Retained_Cust'
Group By [Cust_Id],[Cust_Type]
Having Count([Cust_Type]) >1
Order By NetSales Desc ,Cust_Type_Freq Desc
) S1

Select * From [dbo].[#LoyalCustomers]
Note : In the above query, I have used the Top () Clause only to enable the Order By in the Sub Query result.

Finally from the above Retained Customers(Loyal Customers), we select the few Customers to Award them based on their Retainability Score (i.e, Cust_Type_Freq) and the NetSales generated from them:
IF OBJECT_ID('[tempdb].[dbo].[#LoyalCustomers_Awarded]', 'U') IS NOT NULL
   Begin
       DROP TABLE [dbo].[#LoyalCustomers_Awarded]
   End

Select * Into [dbo].[#LoyalCustomers_Awarded]
From (
   Select Cust_Id, Cust_Type, NetSales,'Loyal Customers' As  [Award Type]
   From [dbo].[#LoyalCustomers]
   Where [Cust_Type_Freq] 
   IN ( Select  Distinct Top (3) [Cust_Type_Freq] From [dbo].[#LoyalCustomers]
  Order By Cust_Type_Freq Desc )
  ) S1

Select * From [dbo].[#LoyalCustomers_Awarded]
Note :
You can use this Analytical Method to apply differently in your Business Scenario. 
Please post your valuable Feedback on this Article.

--------------------------------------------------------------------------------------------------------
Thanks, TAMATAM ; Business Intelligence & Analytics Professional
--------------------------------------------------------------------------------------------------------

Monday, January 21, 2019

How the DAX Function LOOKUPVALUE will work in Power BI

How to use the DAX LOOKUPVALUE Function in Power BI
The LOOKUP VALUE Function works like a VLOOKP Function in Excel. The VLOOKUP in Excel will return the First matching value when there were multiple Matches, but the Dax LOOKUPVALUE Function will throw an error "Multiple values was supplied where single value was expected".
It also works like a RELATED Function in DAX, but LOOKUPVALUE does not need any of the relationship with the other table.
It works with related tables as well to retrieve the values.
Syntax:
LOOKUPVALUE( <result_columnName>, <search_columnName>, <search_value>[, <search_columnName>, <search_value>]…)

Parameters ::
result_columnName :The name of an existing column that contains the value that we want to return. The column must be named using standard DAX syntax, fully qualified. It cannot be an expression.
search_columnName :

The name of an existing column in a related or other table, over which lookup is performed. The column must be named using standard DAX syntax, usually fully qualified. It cannot be an expression.
search_value :
A scalar expression that does not refer to any column in the same table being searched. It is the value/column which used to search over
search_columnName.

Scenario:
Suppose we have a the Tables in a Power BI Model as follows.

From the above Model, we observe the following the things :
-The tables "FactSales" and "DimCountry" have the One-to-One relationship.
-The tables "DimCountry" and "DimRegion" have the Many-to-One relationship.
-The table "DimProducts" is independent and it has no relationship with any of the tables.

The data in the table wise as follows..
DimCountry : 
DimRegion :
DimProducts :
FactSales :

1) Retrieving the "UnitPrice" from Non related table "DimProducts" using LOOKUPVALUE Function:
Now we will create a new Column "UnitPrice" in the table "FactSales", by using the below LOOKUPVALUE Formula retrieve the Unit_Price" from the table "DimProducts" .
UnitPrice = LOOKUPVALUE(DimProducts[Unit_Price], DimProducts[Prod_Id], FactSales[ProdId])

Result:
                            
Notes :
Though there was no relationship between the tables "FactSales" and "DimProducts" , the LOOKUPVALUE Function retrieves the "UnitPrice" values from "DimProducts" by lookup "FactSales[ProdId]" over the other table's column "DimProducts[Prod_Id]".

2) Retrieving the "Region_Name" from a Indirectly related table "DimRegion" using the LOOKUPVALUE Function:
We know that the table "FactSales" has Direct relationship with "DimCountry" , which has a Direct relationship "DimRegion"., so that we can say "FactSales" has a Indirect relationship with "DimRegion".

RegionName = LOOKUPVALUE(DimRegion[Region_Name],DimRegion[Region_Code], FactSales[RegionCode])

We can also write the above formula using the combination of RELATED Function as below, by performing the Lookup between the Related tables.
RegionName = LOOKUPVALUE(DimRegion[Region_Name], DimRegion[Region_Code], RELATED(DimCountry[RegionCode]))

Result:

3) Retrieving the "Country_Name" from a Directly related table "DimCountry" using the LOOKUPVALUE Function:

Suppose,when we use "RegionCode" to lookup over "DimRegion[Region_Code]", we will get the below error, since there were multiple values in the lookup table, so that the below lookup Function will fails to return the values.

CountryName = LOOKUPVALUE(DimCountry[Country_Name],DimRegion[Region_Code], FactSales[RegionCode])

ErrorMessge :
"A table of multiple values was supplied, where single value was expected"

To avoid that Error, we can extend the lookup to one more extra column as shown below:
CountryName = LOOKUPVALUE(DimCountry[Country_Name],DimRegion[Region_Code], FactSales[RegionCode],DimCountry[Country_Id],FactSales[CountryId])

Note:
We can lookup directly using "CountryId" column but to show the multiple combination, we have used the two columns lookup.

Result:

-------------------------------------------------------------------------------------------------------- 
Thanks, TAMATAM ; Business Intelligence & Analytics Professional 
--------------------------------------------------------------------------------------------------------

Sunday, January 20, 2019

How the DAX Function RELATEDTABLE will work in Power BI

How to use DAX RELATEDTABLE Function in Power BI
The RELATEDTABLE Function evaluates a table expression in a context modified by given filters. The RELATEDTETABLE function changes the context in which the data is filtered and evaluates the expression in the new context that you specify.
This function is a shortcut for CALCULATETABLE function with no logical expression.
 
The RELATEDTABLE Function also works like a RELATED Function.
Notes:
It works like a RELATED Function, but It returns a table with all the rows related with the current one, and also it works from One-Side as well as from Many-Side in One-to-Many relationship Models.
It works on the Models with a One-to-One Relationship and Many-to-One Relationship as well.
In case of One-to-One Relationships, It can be used in the new Column/Measure defined on any One-Side table to perform the Lookup & retrieve the values from other Related table and vice versa.
But in case of Many-to-One / One-to-Many Relationships, It can be used in a new Column /Measure defined on One-Side table to perform the Lookup & retrieve values from the Many-Side table and vice versa.

Basic Syntax:
RELATEDTABLE(Table)

Now will discuss about the Functional behavior of the RELATEDTABLE Function in One-to-One and One-to-Many / Many-to-One relationships.

1) One-to-One Relationship:
Suppose we have two Tables with One-to-One Relationship in a Power BI Model as  follows.
For better understanding of the Relationship, I just added suffix as "_One" , tells that the tables have the One-to-One relationship.


Since both tables have the One-to-One relationship based on the key "Prod_Id", we can see the Unique rows for the Prod_Id in both the tables.
This means the column "Prod_Id" in one table has only one instance of a particular value, and the other related table has only one instance of a particular value.



Since there was One-to-One Relationship between Tables, we can use RELATEDTABLE Function to Calculate a new Column/Measure defined on any one of the One-Side table to perform the Lookup & retrieve the values from other Related table.

Defining a new Column and a new Measure on the 'SalesByCust_One' Table:
cTotalSales =
    SUMX(RELATEDTABLE(Products_One),
        SalesByCust_One[Units_Sold]*Products_One[Unit_Price]
        )


mTotalSales =
    SUMX(RELATEDTABLE(Products_One),
        SUMX(SalesByCust_One,
                SalesByCust_One[Units_Sold]*Products_One[Unit_Price]
            )
        )


Defining a new Column and a new Measure on the 'Products_One' Table :
c_TotalSales =
    SUMX(RELATEDTABLE(SalesByCust_One),
        Products_One[Unit_Price]*SalesByCust_One[Units_Sold]
        )

m_TotalSales =
    SUMX(RELATEDTABLE(SalesByCust_One),
        SUMX(Products_One,
            Products_One[Unit_Price]*SalesByCust_One[Units_Sold]
            )
        )


We were able to define the similar new Column and a Measure successfully on any one of the tables.


Note:
In real time, we should define them in any one of the Table, otherwise
we may get a Circular reference error when try to define same calculated Columns with same reference on both the tables, in case of One-to-One Relationship.
Now we can see all these new Columns and Measures will returns the same values for rows, but the grand Total value will be higher in case of Measure, which we will ignore that for now in this article.


Note:

To avoid the Grand Total Issue for Measures, in the above context use the below Formula:
mTotalSales =
    SUMX(
RELATEDTABLE(Products_One),
        CALCULATE
(
SUM(SalesByCust_Many[Units_Sold])) *
        CALCULATE(SUM(Products_One[Unit_Price]))
        )

m_TotalSales =
    SUMX
(RELATEDTABLE(SalesByCust_Many),
        CALCULATE
(SUM(Products_One[Unit_Price])) *
        CALCULATE(SUM(SalesByCust_Many[Units_Sold]))
        )


2) One-to-Many Relationship:
Now lets make a Change in a Table "SalesByCust_One", by adding additional records with repeated Product_Ids, and will rename the Table name as "SalesByCust_Many".
Here, we will leave the "Products_One" table as it is without making any changes.
First go to Edit Queries and Select and Rename the Table to "SalesByCust_Many".


Next go to Query Settings > Applied Steps and then click on the Source gear icon.


Once click on the Source gear icon, it will open a window to update the Table.
Now add the more rows to the Table then click OK to apply and save changes to the Table.

When you Apply Changes and Closes the Query Editor window, Immediately we will see the following Error Message, as it is violating the One-to-One relationship already existed.


Now closing this Message Box, then go ahead and change relationship between the Tables from One-to-One to One-to-Many as follows..


Next say OK, then Apply changes.


Now In the One-to-Many / Many-to-One relational Model, we can see that RELATEDTABLE Function is working in the Calculations (new Column/Measures) made on the One-side as well as on the Many-side tables.


Finally we can Conclude that, in case of Many-to-One / One-to-Many Relationships, the RELATEDTABLE Function can be used in a new Column /Measure defined on One-Side table to perform the Lookup & retrieve values from the Many-Side table and vice versa.

A new Column and a new Measure on the 'SalesByCust_Many' Table:

cTotalSales =
    SUMX(RELATEDTABLE(Products_One),
        SalesByCust_Many[Units_Sold]*Products_One[Unit_Price]
            )

mTotalSales =
    SUMX(RELATEDTABLE(Products_One),
        SUMX(SalesByCust_Many,
                SalesByCust_Many[Units_Sold]*Products_One[Unit_Price]
                )
            )


A new Column and a new Measure on the 'Products_One' Table:
c_TotalSales =

    SUMX(RELATEDTABLE(SalesByCust_Many),
        Products_One[Unit_Price]*SalesByCust_Many[Units_Sold]
        )

m_TotalSales = 
    SUMX(RELATEDTABLE(SalesByCust_Many),
        SUMX(Products_One,
            Products_One[Unit_Price]*SalesByCust_Many[Units_Sold]
                )
            )


Result:

Note:
To avoid the Grand Total Issue for Measures, in the above context use the below Formula:
mTotalSales =
SUMX(
RELATEDTABLE(Products_One),
    CALCULATE(SUM(SalesByCust_Many[Units_Sold])) *
    CALCULATE(SUM(Products_One[Unit_Price]))
    )

m_TotalSales =
SUMX(RELATEDTABLE(SalesByCust_Many),
    CALCULATE(SUM(Products_One[Unit_Price])) *
    CALCULATE(SUM(SalesByCust_Many[Units_Sold]))
    )

Now we can see the same result for new Columns and new Measures as shown below :

Note:
The new Column "c_TotalSales" is defined on the One-Side table, which is returning the aggregated values ( eg: 4650=3600+1050 for the Prod_Id 1234 ) from the Many-Side table.
But the new Measures "m_TotalSales" or "mTotalSales" will returns the same values ( eg: 3600,1050 for the Prod_Id 1234 ) though they defined in either One-Side or Many-Side of the relationship tables.

-------------------------------------------------------------------------------------------------------- 
Thanks, TAMATAM ; Business Intelligence & Analytics Professional 
--------------------------------------------------------------------------------------------------------

Featured Post from this Blog

How to compare Current Snapshot Data with Previous Snapshot in Power BI

How to Dynamically compare two Snapshots Data in Power BI Scenario: Suppose we have a sample Sales data, which is stored with Monthly Snapsh...

Popular Posts from this Blog