Friday, April 7, 2023

How to setup Incremental refresh for a Dataset with Integer Date Key in Power BI

How to setup Incremental refresh for a Dataset with Non Date Key Column in Power BI
Scenario:
Suppose, we have a table/dataset called "tbl_SalesDetails", with Integer Date Key column as "OrderDateKey" as shown below:
In general, we need a Date column in a Dataset to setup the Incremental Refresh in Power BI. However, in this Scenario, we have a Date Key column (OrderDateKey) which is in the Number format.

We can achieve this Scenario, by using a custom Power Query Function to convert Parameter values from DateTime format to Number format as per Source column, as discussed below:

1) Create the RangeStart and RangeEnd Parameters :
The RangeStart and RangeEnd are the required Parameters in DateTime format to be created to setup Incremental refresh in Power BI.

The DateTime Paramaters will accept the values in the below Formats:
Eg: 2019-01-15 00:00:00 AM/PM   (or) 2019/01/15 00:00:00 AM/PM


2) Pass the RangeStart and RangeEnd Parameters as Filters to Dataset :
We need to pass these Parameters as Filters on a Date Key column using Between operator as shown below.
First we pass any available Random values from data as Filters, which will be later replaced with the RangeStart and RangeEnd Parameters.


= Table.SelectRows(vw_Name, each [OrderDateKey] >= 20140105 and [OrderDateKey] < 20161213)

Next we need to replace those Values with the RangeStart and RangeEnd Parameters :

= Table.SelectRows(vw_Name, each [OrderDateKey] >= RangeStart and [OrderDateKey] < RangeEnd)

Since the RangeStart and RangeEnd parameters are in the DateTime format, and Date Key column in Number Format, we will get the below Type error:


We will fix this Type error with help of a custom Function as discussed in the next step. 

3) Create and Use the Custom function for Type Conversion:
We will create a custom Power Query function with below syntax, which will be further used in the Query to convert the OrderDateKey values from DateTime format to Number.

fn_ConvDateTimeToNum :
= (Val as datetime ) => Date.Year(Val)*10000 + Date.Month(Val)*100 + Date.Day(Val)


When we invoke the function by passing a value as "01/15/2019 12:00:00", it returns Number format of the DateTime value as  20190115, as shown below:


4) Apply the Function over RangeStart and RangeEnd Parameters for Type Conversion:
Finally we can apply the Function over RangeStart and RangeEnd Parameters to convert them from DateTime format to Number format to filter the Data.

= Table.SelectRows(vw_Name, each [OrderDateKey] >= fn_ConvDateTimeToNum (RangeStart) and [OrderDateKey] < fn_ConvDateTimeToNum(RangeEnd))

After applying the Type Conversion, the data will be filtered successfully.


Important Note: 
Verify queries have an equal to (=) on either RangeStart or RangeEnd, but not both. If the equal to (=) exists on both parameters, a row could satisfy the conditions for two partitions, which could lead to duplicate data in the model. 
For example, = Table.SelectRows(#"Changed Type", each [OrderDate] >= RangeStart and [OrderDate] <= RangeEnd) could result in duplicate data if there's an OrderDate that equals both RangeStart and RangeEnd.

The complete Power Query of the Dataset is as follows:
let
    Source = Sql.Database(ServerName,DatabaseName),
    vw_Name = Source{[Schema="dbo",Item="tbl_SalesDetails"]}[Data],
    FilterRows = Table.SelectRows(vw_Name, 
each [OrderDateKey] >= fn_ConvDateTimeToNum (RangeStart) and [OrderDateKey] < fn_ConvDateTimeToNum (RangeEnd))
in
    FilterRows

Make sure the Query Folding is happening the for the Dataset, by viewing the Native Query:
select [_].[Order_Id],
    [_].[Region_Id],
    [_].[OrderDateKey],
    [_].[Supplier_Id],
    [_].[Cust_Id],
    [_].[Prod_Id],
    [_].[Units_Sold],
    [_].[Unit_Price],
    [_].[Gross_Sales],
    [_].[COGS],
    [_].[Disc],
    [_].[Net_Sales]
from [dbo].[tbl_SalesDetails] as [_]
where [_].[OrderDateKey] >= 20140115 and [_].[OrderDateKey] < 20161215

Next close and apply the changes to Power Query.

5) Setup the Incremental Refresh for the Dataset/Table in the Power BI Desktop
Right Click and Open the Incremental Refresh settings of the Dataset/Table as shown below.


Next choose the Time Period ( in Years or Quarters or Months or Days ) for Archive and the Time Period 
( in Years or Quarters or Months or Days ) for Incremental Refresh.

The Archive data range will be Refreshed and Imported, only once into the Power BI model, and the Incremental Data Range will be refreshed frequently as per the Schedule.

As per the Latest update, with Power BI Premium, we can enable an option to refresh latest data with Direct Query.


It means, it will try to fetch the latest data in Real Time(with Direct Query) instead of Importing the Data like Archive case.
Please refer the latest Microsoft documentation, to learn more on this feature.

Please Note:
Still the Storage Mode for overall Dataset is Import Mode only, as the Direct Query mode will not Support the Incremental refresh as of now:


You can have a Model with Mixed mode, like one Table with Import Mode having Incremental refresh setup, and other table with Direct Query Mode.
However, you need a Power BI Premium Workspace, to Publish such Models to the Service.

5) Publish the Report to Power BI Service (Workspace):
Once we setup the Incremental Refresh in Power BI, we need to Publish the Report to the Power BI Service.
Please note that, we can't download the Power BI Model file with Incremental refresh enabled.
Hence it is advisable to keep a Backup of your Power BI Report.
 
  
The RangeStart and RangeStart are dynamic Parameters, and values for those parameters are dynamically passed by Power BI Engine automatically.
Also note that, we can't see those Parameters in the Power BI Service like other Parameters:

-------------------------------------------------------------------
Additional Notes:
To use the Parameters in dataset filters in Power Query, we need to enable the below option from Power BI Desktop > Options > Global > Power Query Editor.


To see the Partitions created behind the Dataset (with Incremental Refresh):
If we are using Power BI Premium workspace, we can connect to Dataset via XMLA Endpoint, using SQL Server Management Studio.
We can get the XMLA End point connection string from Workspace Settings as shown below:

We can also get the XMLA EndPoint Connection string from the Dataset Settings > Server Settings:


Sample XMLA EndPoint Connection String :
powerbi://api.powerbi.com/v1.0/myorg/[WorkspaceName];Initial Catalog=[PowerBI_Dataset]

In SQL Server Management Studio, We can pass the XML EndPoint Connection string as a Server Name, and then choose Authentication type as "Azure Active Directory - Universal with MFA", and then enter the User name (eg: username@domain.com) to Connect.


Once we connected to the Data Model, we will be able to see all the tables. From the Table settings, you will be able to see the Partitions.

--------------------------------------------------------------------------------------------------------
Thanks, TAMATAM ; Business Intelligence & Analytics Professional
--------------------------------------------------------------------------------------------------------

No comments:

Post a Comment

Hi User, Thank You for visiting My Blog. Please post your genuine Feedback or comments only related to this Blog Posts. Please do not post any Spam comments or Advertising kind of comments which will be Ignored.