Thursday, April 28, 2022

Power Query to remove unwanted Columns dynamically from Excel Dataset before load to Power BI

How to load only required Columns dynamically from Excel Dataset into Power BI using Power Query
Scenario:
Suppose, we have an excel dataset with some required and the unwanted columns like Colum2, Column 4, Prod_Id_2, Country_Id2, Cust_Id - CopyCust_Id - Copy.1, Country_Id3..etc

These kind of unwanted Columns can be removed as per and load only the required Columns as explained below:

1) First Load the Dataset(not Data) and keep only Source Step:
Source = Excel.Workbook(File.Contents(xl_DataSource), null, true)



Note: I have used the Parameters for DataSource and Dataset :
xl_DataSource = E:\Tech_Lab\Power BI\DataSets\ds_Sample_Orders.xlsx
xl_Dataset = tbl_OrderDetails

2) Next get the list of Column Names from the Dataset:
= Table.ColumnNames(Table.PromoteHeaders(Source{[Item=xl_Dataset,Kind="Sheet"]}[Data], [PromoteAllScalars=true]))


3) Next convert the list of Column Names to a Table and then create a Flag column to identify the Columns to Remove from the Dataset:
ConvertToTable = Table.RenameColumns(Table.FromList(SelectColNames, Splitter.SplitByNothing(), null, null, ExtraValues.Ignore),{{"Column1", "ColumnNames"}})

FlagCols2Remove = Table.AddColumn(ConvertToTable, "RemoveCols_Flag", each if ([ColumnNames] is null or [ColumnNames] = "null" or Text.Length(Text.Select([ColumnNames],{"0".."9"}))>0) then 1 else if Text.Contains([ColumnNames],"- Copy") then 1 else 0)


Note :
The "RemoveCols_Flag" retruns 1 for the Columns to Remove and 0 to Keep.

4) Next generate a final list of ColumnNames to Remove where RemoveCols_Flag=1
ColsList2Remove= (Table.SelectColumns(Table.SelectRows(FlagCols2Remove, each ([RemoveCols_Flag] = 1)),"ColumnNames"))[ColumnNames]


5) Finally, pass the above list of Columns to Source Query, to remove them and Keep only the required columns to load
SelectReqCols= Table.RemoveColumns(Table.PromoteHeaders(Source{[Item=xl_Dataset,Kind="Sheet"]}[Data], [PromoteAllScalars=true]),ColsList2Remove)

Result :

The complete Power Query used in this Scenario is as follows:
let
    Source = Excel.Workbook(File.Contents(xl_DataSource), null, true),
    SelectColNames = Table.ColumnNames(Table.PromoteHeaders(Source{[Item=xl_Dataset,Kind="Sheet"]}[Data], [PromoteAllScalars=true])),
    ConvertToTable = Table.RenameColumns(Table.FromList(SelectColNames, Splitter.SplitByNothing(), null, null, ExtraValues.Ignore),{{"Column1", "ColumnNames"}}),
    FlagCols2Remove = Table.AddColumn(ConvertToTable, "RemoveCols_Flag", each if ([ColumnNames] is null or [ColumnNames] = "null" or Text.Length(Text.Select([ColumnNames],{"0".."9"}))>0) then 1 else if Text.Contains([ColumnNames],"- Copy") then 1 else 0),
    ColsList2Remove = (Table.SelectColumns(Table.SelectRows(FlagCols2Remove, each ([RemoveCols_Flag] = 1)),"ColumnNames"))[ColumnNames],
    SelectReqCols = Table.RemoveColumns(Table.PromoteHeaders(Source{[Item=xl_Dataset,Kind="Sheet"]}[Data], [PromoteAllScalars=true]),ColsList2Remove),
    #"Changed Type" = Table.TransformColumnTypes(SelectReqCols,{{"Order_Id", Int64.Type}, {"Order_Date", type datetime}, {"Prod_Id", Int64.Type}, {"Country_Id", Int64.Type}, {"Cust_Id", Int64.Type}, {"Gross_Sales", Int64.Type}})
in
    #"Changed Type"

--------------------------------------------------------------------------------------------------------
Thanks, TAMATAM ; Business Intelligence & Analytics Professional
--------------------------------------------------------------------------------------------------------

No comments:

Post a Comment

Hi User, Thank You for visiting My Blog. Please post your genuine Feedback or comments only related to this Blog Posts. Please do not post any Spam comments or Advertising kind of comments which will be Ignored.