Saturday, April 23, 2022

How to combine multiple excel datasets with different Column names using Power Query

Power Query to combine multiple excel datasets with different Column names in Power BI
Scenario:
Suppose we have a set of excel input datasets, in a Folder, from which we need to load the excel files with name starts with "Orders_"

Each of these Orders files will have common/different Dimensions and Facts as follows :
Orders_GrossSales.xlsx

Orders_COGS.xlsx

Orders_UnitSold.xlsx

If we observe the above 3 input datasets, they have the same Dimensions like SalesOrder_Id , Order_Date, however the Facts (Gross_Sales, COGS, Units_Sold) are different.

In General, when we combine these files, in Power BI, it loads 3 files with same headers taken from the First file it reads, like below, which is not what we expected :

The expected Result :

The above expected result can be achieved using the following Power Query :

let
    Source = Folder.Files("E:\Tech_Lab\Power BI\DataSets\"),
    LoadFlag = Table.AddColumn(Source, "LoadFlag", each if Text.Start([Name],7)="Orders_" then "Y" else "N"),
    #"Filtered Rows" = Table.SelectRows(LoadFlag, each ([LoadFlag] = "Y")),
    RemoveOtherCols = Table.SelectColumns(#"Filtered Rows",{"Content", "Name", "Extension"}),
    KeepContent = Table.SelectColumns(RemoveOtherCols,{"Content"}),
    PreExpand = Table.SelectColumns(Table.AddColumn(KeepContent, "SourceTables", each func_TransformFile([Content]))
,"SourceTables"),
    SourceTables = PreExpand[SourceTables],
    SourceColumns = List.Union(List.Transform(PreExpand[SourceTables],each Table.ColumnNames(_))),
    ReadyToExpand = PreExpand,
    #"Expanded SourceTables" = Table.ExpandTableColumn(ReadyToExpand, "SourceTables", SourceColumns),
    #"Changed Type" = Table.TransformColumnTypes(#"Expanded SourceTables",{{"SalesOrder_Id", Int64.Type}, {"Order_Date", type datetime}, {"COGS", Int64.Type}, {"Gross_Sales", Int64.Type}, {"Units_Sold", Int64.Type}})
in
    #"Changed Type"

Helper Queries :


here, func_transformFile :

= (pram_File as binary) => let
        Source = Excel.Workbook(pram_File, null, true),
        Data_Sheet = Source{[Item="Data",Kind="Sheet"]}[Data],
        #"Promoted Headers" = Table.PromoteHeaders(Data_Sheet, [PromoteAllScalars=true])
    in
        #"Promoted Headers"

Notes:
The 'LoadFlag' is a custom column that helps to filter the Files whose name starts with "Order_".
= Table.AddColumn(Source, "LoadFlag", each if Text.Start([Name],7)="Orders_" then "Y" else "N")

The 'PreExpand'  step will Invoke the Function to extract Source Tables (Transform File) from Content.
PreExpand = Table.SelectColumns(Table.AddColumn(KeepContent, "SourceTables", each func_TransformFile([Content])),"SourceTables")


Please not that, from Source Table, we should Drill Down (by right click on it )instead of Expand All.
This is the most critical step in this process.


SourceTables= PreExpand[SourceTables] : Now the Source Tables will be converted to a List.


Next in the following step, we can get all the Column Names from each table of  above list. This is the most important step which will be re-used in expand Source Tables step again.
SourceColumns= List.Union(List.Transform(PreExpand[SourceTables],each Table.ColumnNames(_)))


ReadyToExpand = 
PreExpand
Expand SourceTables = Table.ExpandTableColumn(ReadyToExpand, "SourceTables", SourceColumns)

Final Output:
                            

--------------------------------------------------------------------------------------------------------
Thanks, TAMATAM ; Business Intelligence & Analytics Professional
--------------------------------------------------------------------------------------------------------

No comments:

Post a Comment

Hi User, Thank You for visiting My Blog. Please post your genuine Feedback or comments only related to this Blog Posts. Please do not post any Spam comments or Advertising kind of comments which will be Ignored.