Saturday, November 18, 2023

How to identify the Duplicate Values in a Column in Power BI

How to identify the Duplicates in a Column in using M-Query in Power BI
Scenario:
Lets suppose, we have a sample table as below, where the ProdID is duplicated in the table.
Though the record by [TransNo] is unique, we need to identify the duplicate Product Names based on the [ProdID].


1) Identifying the Duplicates in a Column [ProdID] using M-Query:

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("bY9LCoAwDETv0rWL/tWl/esVpPe/hk1KA0I2CTwew8z7CqWNdWKDP+4VoujbpB6pHTfERHRn3YN1T6SQHlNe1EnOdYrLdRopNEm5EDWsa5FCv1wq0bkN+pXaiHrWndugdW030YN1cZuBLe1+FvXyn9A/", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [TransNo = _t, ProdID = _t, ProdName = _t]),
    ChangeType = Table.TransformColumnTypes(Source,{{"TransNo", Int64.Type},{"ProdID", Int64.Type}, {"ProdName", type text}}),

ProdCount = Table.AddColumn(ChangeType, "CountProd", 
(ExTable) => Table.RowCount(Table.SelectRows(ChangeType, 
(InTable) =>  InTable[ProdID] = ExTable[ProdID]))
),

IndexProdCount = Table.AddColumn(ProdCount, "CountProdIndex", 
 (ExTable) => Table.RowCount(Table.SelectRows(ChangeType, 
 (InTable) =>  InTable[TransNo] <= ExTable[TransNo] and InTable[ProdID] = ExTable[ProdID]))
),

DupProd = Table.AddColumn(IndexProdCount, "DuplicateProd", each if [CountProdIndex]>1 then "Y" else null)
in
    DupProd

Result:

Notes:
CountProd : It returns the no. of times the ProdID repeated in that table. For example, the ProdID = 123 (ABC) is repeated 4 Times, which means 3 duplicates are there.

CountProdIndex : It returns the Cumulative no. of times the ProdID repeated in that table. For example, the ProdID = 123 (ABC) is repeated 4 Times, hence it returns the values as 1, 2,3 and 4.
---------------------------------------------------------------------------------------------------------
2) Identifying the Duplicates in a Column [ProdID] using DAX Query:

CountProd =
VAR vCountProd = COUNTROWS(
    FILTER(tbl_Sample, tbl_Sample[ProdID] = EARLIER(tbl_Sample[ProdID]))
)

RETURN
vCountProd
----------------------------------------------------------------------
CountProdIndex =
VAR vCountProd = COUNTROWS(
    FILTER(tbl_Sample,
    tbl_Sample[TransNo] <= EARLIER(tbl_Sample[TransNo]) &&
    tbl_Sample[ProdID] = EARLIER(tbl_Sample[ProdID]))
)

RETURN
vCountProd
----------------------------------------------------------------------
DuplicateProd =
VAR vCountProd = COUNTROWS(
    FILTER(tbl_Sample,
    tbl_Sample[TransNo] <= EARLIER(tbl_Sample[TransNo]) &&
    tbl_Sample[ProdID] = EARLIER(tbl_Sample[ProdID]))
)
VAR vDuplicateProd = IF (vCountProd>1, "Y", "N")
RETURN
vDuplicateProd Result:

Notes:
The CountProd can be written as per below as well:
CountProd =
VAR vProdID = tbl_Sample[ProdID]
VAR vCountProd = COUNTROWS(
    FILTER(tbl_Sample, tbl_Sample[ProdID] = vProdID)
)
RETURN
vCountProd


Thanks, TAMATAM

No comments:

Post a Comment

Hi User, Thank You for visiting My Blog. Please post your genuine Feedback or comments only related to this Blog Posts. Please do not post any Spam comments or Advertising kind of comments which will be Ignored.