Wednesday, April 27, 2022

How to Limit the Data Load at the Source Step for Excel Dataset before loading into Power BI

Limit the Data Load at the Source Step for Excel Dataset before loading into Power BI using Parameters in Power Query
Scenario:
Suppose we have an excel Dataset, with Millions of Rows, for which we want to apply the Limit while load at the Source step stage, using a Parameter, say Year.
Also, we want to Load only first N rows for the selected Year parameter, to see the sample data.
These parameters we can adjust in Power BI Service, to enable the Full Load.
This will be the most Powerful and useful technique when we deal with the large Excel datasets.

Now, lets see how we can implement and achieve this Scenario.
1) First create the required Parameters for Datasource, Dataset, Load Year, First N Rows:
xl_Datasource = E:\Tech_Lab\Power BI\DataSets\ds_Sample_Data.xlsx
xl_Dataset = tbl_OrderDetails
loadYear = 2014
xl_LoadFirstN_Rows = 5
Notes:
xl_Datasource : It is a required Parameter of the type Text, which holds the path of source file.
xl_Dataset : It is a required Parameter of the type Text, which holds the sheet/table of source dataset.
loadYearIt is a required Parameter of the type Decimal Number, which holds the Year value, which will used to limit the Data Load at the Source step of the Power Query.
xl_LoadFirstN_Rows : It is an optional Parameter of the type Decimal Number, which holds a number, which will used to limit the no. of rows to Load at the Source step after the Year filter of Power Query.

2) Next pass the loadYear Parameter in the next step to Source:
In this step, Dataset headers will be Promoted and then it will be Filtered by the [Order_Year] field based onthe parameter loadYear.
Limit_Load_byYear
Table.SelectRows(Table.PromoteHeaders(Source{[Item=xl_Dataset,Kind="Sheet"]}[Data],[PromoteAllScalars=true]),each [Order_Year] >= loadYear)


3) Next pass the parameter xl_LoadFirstN_Rows to limit the no. of Rows to load from dataset for the selected Year:
We can leave blank or pass zero for this optional parameter to load all rows. 
Filter_FirstN_Rows
= if (xl_LoadFirstN_Rows=0 or xl_LoadFirstN_Rows is null) then (Limit_Load_byYear) else Table.FirstN(Limit_Load_byYear,xl_LoadFirstN_Rows)

4) Finally, do the required transformations.
The complete power query used in this scenario is as follows..
let
Source = Excel.Workbook(File.Contents(xl_DataSource), null, true),
/* To Load TopN Rows */
Limit_Load_byYear = Table.SelectRows(Table.PromoteHeaders(Source{[Item=xl_Dataset,Kind="Sheet"]}[Data],[PromoteAllScalars=true]),each [Order_Year] >= loadYear),
Filter_FirstN_Rows = if (xl_LoadFirstN_Rows=0 or xl_LoadFirstN_Rows is null) then (Limit_Load_byYear) else Table.FirstN(Limit_Load_byYear,xl_LoadFirstN_Rows),
#"Changed Type" = Table.TransformColumnTypes(Filter_FirstN_Rows,{{"Order_Id", Int64.Type}, {"Order_Date", type date}, {"Order_Year", Int64.Type}, {"SalesCountry_Id", Int64.Type}, {"Country_Name", type text}, {"Cust_Id", Int64.Type}, {"Cust_Name", type text}, {"Prod_Id", Int64.Type}, {"Prod_Name", type text}, {"Gross_Sales", Int64.Type}})
in
#"Changed Type"

Result:

Notes:
In case, if we want to limit data load by a DateTime field, we can create and pass parameter  as follows..

= Table.SelectRows(Table.PromoteHeaders(Source{[Item=xl_Dataset,Kind="Sheet"]}[Data],[PromoteAllScalars=true]),each [Order_Date] >= loadDate)

--------------------------------------------------------------------------------------------------------
Thanks, TAMATAM ; Business Intelligence & Analytics Professional
--------------------------------------------------------------------------------------------------------

No comments:

Post a Comment

Hi User, Thank You for visiting My Blog. Please post your genuine Feedback or comments only related to this Blog Posts. Please do not post any Spam comments or Advertising kind of comments which will be Ignored.