Scenario:
In modern data platforms, it is often necessary to track whether critical business entities have been successfully processed in the latest pipeline execution.
In modern data platforms, it is often necessary to track whether critical business entities have been successfully processed in the latest pipeline execution.
Suppose we want to monitor the Load Status and Readiness Status of a defined list of entities such as:
dim_Customer, dim_Product, fact_Orders, fact_Inventory, dim_Supplier, fact_Payments, and fact_Order_Returns.
We can achieve this scenario by setting up two simple tables as per below.
dim_Customer, dim_Product, fact_Orders, fact_Inventory, dim_Supplier, fact_Payments, and fact_Order_Returns.
We can achieve this scenario by setting up two simple tables as per below.
ref_Entities_to_Track:
It is a reference table which lists all the entities we expect to load in every pipeline run.
It is a reference table which lists all the entities we expect to load in every pipeline run.
fact_Pipeline_Execution_Log:
It is a fact table, contains the details of each run, including the run date, pipeline name, entity key, entity name, pipeline run status, and entity refresh status.
Notes: Assume that these tables do not have a direct relationship in the Data Model.
By comparing the reference table with the pipeline execution log for the latest run date (either today or the most recent available date), we can quickly identify the status of each entity.
This shows whether an entity was successfully updated, failed to load, or was missing.
Using the below measures, we can also derive a simple flag (1/0) for each entity to highlight its presence in the latest run.
Entity Availability Flag =
VAR _LatestRunDate =
CALCULATE(MAX(fact_Pipeline_Execution_Log[Pipeline_Run_Date]))
VAR _EntityKey = MAX(ref_Entities_to_Track[Entity_Key])
RETURN
IF(
COUNTROWS(
FILTER(
fact_Pipeline_Execution_Log,
fact_Pipeline_Execution_Log[Pipeline_Run_Date] = _LatestRunDate &&
fact_Pipeline_Execution_Log[Entity_Key] = _EntityKey
)
) > 0,
1,
0
)
1: Indicates the Entity exists in the latest run.
0: Indicates the Entity is missing in the latest run.
For entities included in the latest run, we calculate the following measures to track their status:
- Entity_Refresh_Status: to show whether the entity had new updates, no updates, or a load failure.
- Pipeline_Run_Status: to show whether the pipeline succeeded, failed, or was missing.
Entity Refresh Status =
VAR _LatestRunDate =
CALCULATE(MAX(fact_Pipeline_Execution_Log[Pipeline_Run_Date]))
VAR _EntityKey = MAX(ref_Entities_to_Track[Entity_Key])
RETURN
IF(
[Entity Availability Flag] = 0,
"Missing in Data Load",
MAXX(
FILTER(
fact_Pipeline_Execution_Log,
fact_Pipeline_Execution_Log[Pipeline_Run_Date] = _LatestRunDate &&
fact_Pipeline_Execution_Log[Entity_Key] = _EntityKey
),
fact_Pipeline_Execution_Log[Entity_Refresh_Status]
)
)
Pipeline Run Status =
VAR _LatestRunDate =
CALCULATE(MAX(fact_Pipeline_Execution_Log[Pipeline_Run_Date]))
VAR _EntityKey = MAX(ref_Entities_to_Track[Entity_Key])
RETURN
IF(
[Entity Availability Flag] = 0,
"Not Avaiable",
MAXX(
FILTER(
fact_Pipeline_Execution_Log,
fact_Pipeline_Execution_Log[Pipeline_Run_Date] = _LatestRunDate &&
fact_Pipeline_Execution_Log[Entity_Key] = _EntityKey
),
fact_Pipeline_Execution_Log[Pipeline_Run_Status]
)
)
Notes:
We can identify the latest pipeline run date and assign a flag value of 1 or 0 using the below measures.
By setting the [Latest RunDate Flag] measure to 1 for the most recent run date, Slicers in the Power BI can automatically default to that execution.This ensures stakeholders always see the latest pipeline health information without manual effort.
Latest RunDate =
CALCULATE(
MAX(fact_Pipeline_Execution_Log[Pipeline_Run_Date]),
ALL(fact_Pipeline_Execution_Log)
)
Latest RunDate Flag =
VAR _MaxRunDate =
CALCULATE (
MAX ( fact_Pipeline_Execution_Log[Pipeline_Run_Date] ),
ALL ( fact_Pipeline_Execution_Log )
)
VAR _Today = TODAY()
VAR _LatestRunDate = MIN ( _MaxRunDate, _Today )
RETURN
IF (
SELECTEDVALUE ( fact_Pipeline_Execution_Log[Pipeline_Run_Date] ) = _LatestRunDate,
1,
0
)
Business Value Creation / Insights:
From the above scenario, technical teams can derive practical insights into pipeline execution and data readiness. Dashboards in Power BI can highlight:
- Entities that were successfully refreshed with new data.
- Entities that had no changes.
- Entities that failed to load.
- Entities that were missing from the latest run.
This helps data engineers and support teams quickly detect issues, validate pipeline health, and take corrective actions. By automating this monitoring, they reduce manual checks, improve reliability, and ensure that downstream business users always receive consistent and trustworthy data.
--------------------------------------------------------------------------------------------------------
Thanks, TAMATAM ; Business Intelligence & Analytics Professional
--------------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------------
No comments:
Post a Comment
Hi User, Thank You for visiting My Blog. If you wish, please share your genuine Feedback or comments only related to this Blog Posts. It is my humble request that, please do not post any Spam comments or Advertising kind of comments, which will be Ignored.