Thursday, February 12, 2026

How to Identify the duplicate rows in a table using Power Query

How to Identify the duplicate Values in a Column using Power Query
Scenario:
Suppose, we have a dataset(ds_Sample) as below with duplicate values in a Prod_ID column and some duplicate rows:
let
    Source = #table (
        {"Trans_ID", "Prod_ID", "Prod_Name"},
        {
            {12345, 123, "Laptop"},
            {12346, 124, "Radio"},
            {12347, 125, "Keyboard"},
            {12348, 126, "Television"},
            {12349, 127, "Printer"},
            {12350, 128, "Scanner"},
            {12351, 129, "Camera"},
            {12352, 130, "Tripod"},
            {12345, 123, "Laptop"},     // duplicate row
            {12348, 126, "Television"}, // duplicate row
            {12351, 129, "Camera"},     // duplicate row
            {12356, 127, "Printer"},
            {12357, 124, "Radio"},
            {12358, 126, "Television"}
        }
    ),
    ChangeType = Table.TransformColumnTypes(Source, {
        {"Trans_ID", Int64.Type},
        {"Prod_ID", Int64.Type},
        {"Prod_Name", type text}
    })
in
    ChangeType

Now can identify and flag the duplicate values and rows using below methods.

Method-1: Using Text.Combine method with Composite Key based on specific Columns:

let
    /* Source dataset */
    Source = ds_Sample,

    /* Columns to include in Composite Key */
    SelectedCols = {"Trans_ID","Prod_ID"},

    /* Composite Key from selected columns */
    Def_CompositeKey = Table.AddColumn(Source, "Composite_Key", 
        each Text.Combine(
            List.Transform(
                Record.ToList(Record.SelectFields(_, SelectedCols)), 
                each Text.From(_)
                ), 
                "|"
            )
        ),

    /* Flag duplicate rows using Composite Key */
    Flag_DupRow = Table.AddColumn(Def_CompositeKey, "IsDuplicate_Row", 
        each if List.Count(List.Select(Def_CompositeKey[Composite_Key], (x) => x = [Composite_Key])) > 1 
        then "Yes" else "No"
        ),

    /* Flag duplicate products by Prod_ID */
    Flag_DupProd = Table.AddColumn(Flag_DupRow, "IsDuplicate_Prod", 
        each if List.Count(List.Select(Source[Prod_ID], (x) => x = [Prod_ID])) > 1 
        then "Yes" else "No"
        )
in
    Flag_DupProd

Result:
------------------------------------------------
Method-2: Using Text.ToBinary method and Composite Key (Binary Encoded) based on specific Columns:

let
    // Source dataset
    Source = ds_Sample,

    // Columns to include in Composite_Hash_Key
    SelectedCols = {"Trans_ID","Prod_ID"},

    // Generate Composite_Hash_Key from selected columns only
    Def_CompositeKey = Table.AddColumn(Source, "Composite_Hash_Key", 
        each Binary.ToText(
            Text.ToBinary(
                Text.Combine(
                    List.Transform(
                        Record.ToList(Record.SelectFields(_, SelectedCols)), 
                        each Text.From(_)
                        ), 
                        "|"
                    )
                ),
                BinaryEncoding.Base64
            )
        ),

    // Flag duplicate rows using Composite_Hash_Key
    Flag_DupRow = Table.AddColumn(Def_CompositeKey, "IsDuplicate_Row", 
        each if List.Count(List.Select(Def_CompositeKey[Composite_Hash_Key], (x) => x = [Composite_Hash_Key])) > 1 
        then "Yes" else "No"
        ),

    // Flag duplicate products by ProdID
    Flag_DupProd = Table.AddColumn(Flag_DupRow, "IsDuplicate_Prod", 
        each if List.Count(List.Select(Source[Prod_ID], (x) => x = [Prod_ID])) > 1 
        then "Yes" else "No"
        )
in
    Flag_DupProd

Result:
------------------------------------------------
Method-3: Using Table.Group method to identify the Duplicate values based on Prod_ID:

let
    Source = ds_Sample,
    // Count how many times each ProdID appears
    GroupByProd = Table.Group(Source, {"Prod_ID"}, {{"Prod_Count", each Table.RowCount(_), Int64.Type}}),
    // Join back to original details table
    MergeToDetails = Table.NestedJoin(Source, {"Prod_ID"}, GroupByProd, {"Prod_ID"}, "ProdCount"),
    ExpandTable = Table.ExpandTableColumn(MergeToDetails, "ProdCount", {"Prod_Count"}),
    // Add a column to flag duplicate values
    FlagDup_Prod = Table.AddColumn(ExpandTable, "IsDuplicate_Prod", each if [Prod_Count] > 1 then "Yes" else "No")
in
    FlagDup_Prod

Result:
------------------------------------------------
Method-4: Using Table.Group method to return the no. of Duplicate rows and products:

let
    Source = ds_Sample,
    // 1. Count duplicate rows
    DistinctRows = Table.Distinct(Source),
    DuplicateRowCount = Table.RowCount(Source) - Table.RowCount(DistinctRows),

    // 2. Count duplicate products (ProdID)
    GroupByProd = Table.Group(Source, {"Prod_ID"}, {{"Count", each Table.RowCount(_), Int64.Type}}),
    DuplicateProd = Table.RowCount(Table.SelectRows(GroupByProd, each [Count] > 1)),

    // 3. Return summary of results
    Summary= #table(
        {"Metric", "Value"},
        {
            {"Duplicate Rows", DuplicateRowCount},
            {"Duplicate Products", DuplicateProd}
        }
    )
in
    Summary

Result:

--------------------------------------------------------------------------------------------------------
Thanks
--------------------------------------------------------------------------------------------------------

Saturday, November 22, 2025

How to Use Dynamic Query Parameters to Switch Source Query Connections in Power BI

How to Switch Source Query Connection Strings in Power BI with Dynamic Query Parameters
Scenario:
To manage multiple data or query environments such as Dev, Test, and Prod often requires maintaining separate connection strings in Power BI. 
Rather than hard‑coding server, database, and schema names, you can centralize these details in an environment table and control them dynamically through a parameter (e.g. p_Query_Env with values like Dev, Test, and Prod). 
With this setup, switching between environments becomes as simple as changing the parameter value, while Power Query automatically selects the correct server, database, and schema. This provides a clean, scalable solution for environment management that minimizes manual effort and ensures consistency across deployments.

We can achieve the above Scenario, as discussed below.

Step1: Define an Environment Variables Table
To enable dynamic source data environment switching in Power BI, we need to create a dedicated Environment Variables Table like below. This table acts as a central repository for all connection details across your environments (Dev, Test, and Prod).

Info_Query_Environment:

let
    /* Define Data Environment variables table with respective connection details */
    _Env_Table = 
        #table(
            {"Source_Env", "Server_Name", "DB_Name", "Schema_Name"},
            {
                {"Dev",  "Dev_Server01",  "Dev_DB",  "Dev_Schema"},
                {"Test", "Test_Server01", "Test_DB", "Test_Schema"},
                {"Prod", "Prod_Server01", "Prod_DB", "Prod_Schema"}
            }
        ),

    _DataType = Table.TransformColumnTypes(
        _Env_Table,
        {
            {"Source_Env", type text}, {"Server_Name", type text},
            {"DB_Name", type text}, {"Schema_Name", type text}
        }
    )
in
    _DataType


Step 2: Define a Query Parameter for Dynamic Environment Selection
We need to define a list parameter as p_Query_Environ that controls which environment (Dev, Test, or Prod) Power BI should use when running queries. 
This parameter acts as a switch, allowing you to easily toggle between environments without editing the query code.


Step 3: Pass Connection Strings Dynamically to Queries
Instead of hard‑coding server, database, and schema names, we will filter the Environment Variables Table using the parameter (p_Query_Environ) and feed those values directly into the query connection. 
This makes our queries flexible and environment‑aware, ensuring that Power BI automatically connects to the right source based on the environment we select.

Example: dim_Country

let
    /* 
    Filter the Info_Query_Environment table based on Parameter Value (Eg. Dev, Test, Prod)
    */
    SelectedEnv = 
        Table.SelectRows(Info_Query_Environment, each [Source_Env] = p_Query_Environ){0},

    /* Assign the values to Environment Variables */
    _ServerName   = SelectedEnv[Server_Name],
    _DatabaseName = SelectedEnv[DB_Name],
    _SchemaName   = SelectedEnv[Schema_Name],

    /* SQL Query with dynamic Environment Variables */
    vSrc_Qry = "Select Distinct [Country_ID], [Country_Name], [Currency_Code]
         From " & _SchemaName & ".dim_Country",

    /* Run the SQL Query */
    Source = Sql.Database(_ServerName, _DatabaseName, [Query = vSrc_Qry])
in
    Source

Final Step (Optional): Save Report as Power BI Template
We can save your report as a Power BI Template (.PBIT), which captures the report structure, queries, and parameters, but leaves out the actual data. This makes it ideal for scenarios where different users need to run the same report against different environments (Dev, Test, Prod).


Provide the description to the Report Template while saving.


When a user opens the Power BI Template, it automatically prompts to choose a value for the p_Query_Environ parameter. The options (e.g., Dev, Test, Prod) come directly from the parameter definition that we created earlier.
Based on the selection, Power Query filters the environment table and connects to the correct server, database, and schema for that environment. 
Once the environment is chosen, the user simply clicks Load to run the report, and Power BI retrieves data from the selected environment seamlessly.


Note:
You can refine this method to fit your own scenario. Adjust the setup as needed for your environment.

--------------------------------------------------------------------------------------------------------
Thanks
--------------------------------------------------------------------------------------------------------

Saturday, November 15, 2025

How to display Maximum Time Value per Metric in a Power BI Column Chart

How to display Max Time Value of each Metric on a Column Chart in Power BI
Scenario:
Suppose we have a dataset where the Metric_Value column contains DateTime values for each Metric_Name, as shown below.


From this dataset, we need to extract the time component from Metric_Value and convert it into a numeric format so it can be used to display properly on a clustered column chart.

The refined Dataset with Time Values is as follows:
let
    SampleData = 
    #table(
        {"Metric_Id", "Metric_Name", "Metric_Value", "Metric_DataType"},
        {
            {1, "JobRunDateTime", #datetime(2023, 5, 10, 14, 30, 00), "DateTime"},
            {2, "JobRunDateTime", #datetime(2024, 1, 15, 21, 45, 30), "DateTime"},
            {3, "JobStartTime", #time(9, 45, 15), "Time"},
            {4, "JobStartTime", #time(14, 30, 10), "Time"}
        }
    ),

// Convert Time to duration:
_Duration = Table.AddColumn(
        SampleData,
        "TimeAsDuration",
        each #duration(
                0,
                Time.Hour([Metric_Value]),
                Time.Minute([Metric_Value]),
                Time.Second([Metric_Value])
            ),
        type duration
    ),

// Convert duration to total seconds (numeric):
_TotalSeconds = Table.AddColumn(
        _Duration,
        "TimeAsSeconds",
        each Duration.TotalSeconds(
            #duration(
                0,
                Time.Hour([Metric_Value]),
                Time.Minute([Metric_Value]),
                Time.Second([Metric_Value])
                )
            ),
        type number
    ),

// Convert Time into Hour.MinuteSeconds numeric format with padded Seconds:
_NumericTime = Table.AddColumn(
        _TotalSeconds,
        "TimeAsNumber",
        each Number.Round(
            Number.FromText(
            Text.From(Time.Hour([Metric_Value])) & "." &
            Text.PadStart(Text.From(Time.Minute([Metric_Value])),2,"0") &
            Text.PadStart(Text.From(Time.Second([Metric_Value])),2,"0")
            ), 
        4),
        type number
    )

in
    _NumericTime

The final formatted dataset includes a TimeAsNumber column, which represents the numeric conversion of time values. We will use this column to generate a clustered column chart that displays the maximum time value for each metric.

The TimeAsDuration (d.HH:mm:ss) and TimeAsSeconds columns were created for reference purposes only and are not used further in this example.


We will use the below measure to format the TimeAsNumber to Time (HH:mm:ss) to display on the Chart for the Data Labels.

Format TimeLabel =
VAR TimeValue = MAX(tbl_Metrics_TimeValues[TimeAsNumber])   -- E.g. 9.4515
VAR Hrs   = INT(TimeValue)                             -- Hours
VAR Mnts_Sec  = ROUND((TimeValue- Hrs) * 10000,0)      -- Minutes+Seconds
VAR Mnts   = INT(DIVIDE(Mnts_Sec,100))                 -- Minutes
VAR Sec   = MOD(Mnts,100)                              -- Seconds
RETURN
FORMAT(Hrs,"0") & ":" & FORMAT(Mnts,"00") & ":" & FORMAT(Sec,"00")

Generate the Clustered Column Chart:
We will use the Metric Name on the X-Axis and TimeAsNumber column on Y-axis to generate the Clustered Column chart with Max value for each Metric.


Since the purpose is not to display the Time as Number, hide the Y‑axis values, the Y‑axis title, and the tooltips.
Next, in the Data Labels > Values section of the chart formatting options, assign the measure Format TimeLabel Value (created earlier) so that the actual metric Time values are displayed on columns instead of the number.


We can use the below measure to format the Columns of the Chart.
Metric ColorFormat =
SWITCH(
    SELECTEDVALUE(tbl_Metrics_TimeValues[Metric_Name]),
    "JobRunDateTime",    "#1E90FF",    -- Azure Blue
    "JobStartTime",      "#4682B4",    -- SteelBlue
    "#808080"         -- Gray (default color)
    )

Result:
The final output is a clustered column chart with Metric_Name on the X‑axis and metric Max time values shown as data labels.


--------------------------------------------------------------------------------------------------------
Thanks
--------------------------------------------------------------------------------------------------------

Featured Post from this Blog

How to compare Current Snapshot Data with Previous Snapshot in Power BI

How to Dynamically compare two Snapshots Data in Power BI Scenario: Suppose we have a sample Sales data, which is stored with Monthly Snapsh...

Popular Posts from this Blog