Sunday, November 2, 2025

How to Analyze Data Pipeline Run Statistics Using DAX in Power BI

How to Analyze Data Pipeline Run Statistics Using DAX and M-Query
In modern data systems, pipelines are essential for automating the flow of information across platforms and regions. They often run multiple times a day with mixed results, some successful, others failing. To monitor performance effectively, especially to catch recent data load failures, it's important to isolate the latest run for each pipeline. Without this focus, historical logs can become overwhelming and difficult to interpret.

This article explores how to solve that challenge using Power BI. By combining Power Query and DAX, you can flag the latest run, calculate run durations, and extract key metrics like run status and data load success. This approach helps build a clear, dynamic view of pipeline health for better monitoring and decision-making.

Scenario:
Suppose we have a dataset that captures the execution history of data pipelines across various regions. Each record in the dataset represents a single pipeline run and includes columns such as:
  • Pipeline_ID: Unique identifier for each pipeline
  • Pipeline_Name: Descriptive name of the pipeline
  • Pipeline_Region: Deployment region (e.g., US-East, EU-Central)
  • Pipeline_Run_Status: Outcome of the run (Success or Fail)
  • Data_Load_Status: Whether the data load was successful or not.
  • Pipeline_Run_StartDate: Timestamp when the Pipeline run started.
  • Pipeline_Run_EndDate: Timestamp when the Pipeline run completed.

Now we will use this Pipeline Run log data to analyze it further by identifying the latest run for each pipeline, calculating how long each run took, and evaluating whether the data load was successful or not.

Pipeline_Run_Hours:  We can create this Column in Power Query to calculate the Time (in Hours) taken by each Pipeline to run.

let
    _start = [Pipeline_Run_StartDate],
    _end = [Pipeline_Run_EndDate],
    _duration = _end - _start,
    _days = Duration.Days(_duration),
    _hours = Duration.Hours(_duration),
    _minutes = Duration.Minutes(_duration),
    _totalMinutes = (_days * 1440) + (_hours * 60) + _minutes,
    _totalHours = _totalMinutes / 60
in
    _totalHours
--------------------------------------------------------------------------------------------------------

The complete Power Query logic of the above Sample data (Changes dynamically based on the current date time) is as follows:

let
    _today = DateTime.LocalNow(),

// Create pipeline data with added Pipeline_Region
_pipelineData = #table(
        {
            "Pipeline_ID",    
            "Pipeline_Name",
            "Pipeline_Region",
            "Pipeline_Run_Status",
            "Data_Load_Status",
            "Pipeline_Run_StartDate",
            "Pipeline_Run_EndDate"
        },
        {
            // Alpha pipeline run stats with latest run as Success
            {123, "Alpha", "US-East", "Success", "Success", _today - #duration(29, 6, 45, 0), _today - #duration(28, 8, 30, 0)},
            {123, "Alpha", "US-East", "Success", "Fail",    _today - #duration(25, 7, 15, 0), _today - #duration(24, 8, 45, 0)},
            {123, "Alpha", "US-East", "Fail",    "Fail",    _today - #duration(20, 8, 0, 0),  _today - #duration(19, 10, 0, 0)},
            {123, "Alpha", "US-East", "Success", "Success", _today - #duration(1, 9, 0, 0),   _today - #duration(0, 8, 15, 0)},

            // Beta pipeline run stats with latest run as Fail
            {234, "Beta",  "US-West", "Success", "Success", _today - #duration(28, 6, 30, 0), _today - #duration(27, 9, 15, 0)},
            {234, "Beta",  "US-West", "Success", "Fail",    _today - #duration(22, 7, 45, 0), _today - #duration(21, 9, 30, 0)},
            {234, "Beta",  "US-West", "Success", "Fail",    _today - #duration(1, 8, 30, 0),  _today - #duration(0, 9, 0, 0)},

            // Gamma pipeline run stats with latest run as Fail
            {345, "Gamma", "EU-Central", "Success", "Success", _today - #duration(27, 9, 15, 0), _today - #duration(26, 10, 0, 0)},
            {345, "Gamma", "EU-Central", "Success", "Fail",    _today - #duration(21, 8, 0, 0),  _today - #duration(20, 10, 30, 0)},
            {345, "Gamma", "EU-Central", "Success", "Fail",    _today - #duration(1, 7, 0, 0),   _today - #duration(0, 9, 45, 0)},

            // Delta pipeline run stats with latest run as Success
            {456, "Delta", "Asia-Pacific", "Success", "Success", _today - #duration(30, 6, 30, 0), _today - #duration(29, 7, 45, 0)},
            {456, "Delta", "Asia-Pacific", "Success", "Fail",    _today - #duration(18, 7, 30, 0), _today - #duration(17, 8, 0, 0)},
            {456, "Delta", "Asia-Pacific", "Success", "Success", _today - #duration(2, 8, 15, 0),  _today - #duration(1, 7, 30, 0)},

            // Omega pipeline run stats with latest run as Success
            {567, "Omega", "India", "Success", "Success", _today - #duration(26, 9, 30, 0), _today - #duration(25, 10, 15, 0)},
            {567, "Omega", "India", "Fail",    "Fail",    _today - #duration(15, 8, 30, 0), _today - #duration(14, 10, 30, 0)},
            {567, "Omega", "India", "Success", "Success", _today - #duration(5, 7, 45, 0),  _today - #duration(4, 10, 0, 0)}
        }
    ),

// Add Is_Pipeline_Latest_Run
_addLatestFlag = Table.AddColumn(
        _pipelineData,
        "Is_Pipeline_Latest_Run",
        each 
            let
                _currentPipeline = [Pipeline_ID],
                _currentEndDate = [Pipeline_Run_EndDate],
                _maxEndDate = List.Max(
                    Table.SelectRows(_pipelineData, each [Pipeline_ID] = _currentPipeline)[Pipeline_Run_EndDate]
                )
            in
                _currentEndDate = _maxEndDate,
        type logical
    ),

// Change types
_changeType = Table.TransformColumnTypes(
        _addLatestFlag,
        {
            {"Pipeline_ID", Int64.Type},
            {"Pipeline_Name", type text},
            {"Pipeline_Region", type text},
            {"Pipeline_Run_Status", type text},
            {"Data_Load_Status", type text},
            {"Pipeline_Run_StartDate", type datetime},
            {"Pipeline_Run_EndDate", type datetime},
            {"Is_Pipeline_Latest_Run", type logical}
        }
    ),

// Add Pipeline_Run_Hours
_addRunHours = Table.AddColumn(
        _changeType,
        "Pipeline_Run_Hours",
        each 
            let
                _start = [Pipeline_Run_StartDate],
                _end = [Pipeline_Run_EndDate],
                _duration = _end - _start,
                _days = Duration.Days(_duration),
                _hours = Duration.Hours(_duration),
                _minutes = Duration.Minutes(_duration),
                _totalMinutes = (_days * 1440) + (_hours * 60) + _minutes,
                _totalHours = _totalMinutes / 60
            in
                _totalHours,
        type number
    ),

// Add Pipeline Run Day of Week
_addDayOfWeek = Table.AddColumn(
        _addRunHours,
        "Pipeline_Run_DayOfWeek",
        each Date.DayOfWeekName([Pipeline_Run_StartDate]),
        type text
    ),

// Add Pipeline Duration Category
_addDurationCategory = Table.AddColumn(
        _addDayOfWeek,
        "Pipeline_Run_Duration_Category",
        each 
            let h = [Pipeline_Run_Hours]
            in 
                if h < 2 then "Short" 
                else if h < 6 then "Medium" 
                else "Long",
        type text
    )
in
    _addDurationCategory
--------------------------------------------------------------------------------------------------------

We define the following calculated Column in Power BI which will return TRUE for the latest Run date of each Pipeline (either Success or Fail), otherwise it returns FALSE.

Is_Latest_Pipeline_Run =
VAR CurrentPipeline = tbl_Pipeline_Run_Stats[Pipeline_Id]
VAR CurrentEndDate = tbl_Pipeline_Run_Stats[Pipeline_Run_EndDate]
VAR MaxEndDate =
    CALCULATE(
        MAX(tbl_Pipeline_Run_Stats[Pipeline_Run_EndDate]),
        FILTER(tbl_Pipeline_Run_Stats, tbl_Pipeline_Run_Stats[Pipeline_Id] = CurrentPipeline)
    )
RETURN
    CurrentEndDate = MaxEndDate
--------------------------------------------------------------------------------------------------------
Next, we can define the following set of Measures needed for the Analysis:

Pipeline Run Hours = SUM(tbl_Pipeline_Run_Stats[Pipeline_Run_Hours])

Pipeline Latest Run Hours =
VAR _LatestRunEndDate =
    CALCULATE(
        MAX(tbl_Pipeline_Run_Stats[Pipeline_Run_EndDate]),
        ALLEXCEPT(tbl_Pipeline_Run_Stats, tbl_Pipeline_Run_Stats[Pipeline_Id])
    )
RETURN
    CALCULATE(
        SUM(tbl_Pipeline_Run_Stats[Pipeline_Run_Hours]),
        tbl_Pipeline_Run_Stats[Pipeline_Run_EndDate] = _LatestRunEndDate
    )

Pipeline Latest Run Date =
VAR _CurrentPipeline = MAX(tbl_Pipeline_Run_Stats[Pipeline_Id])
VAR _CurRunEndDate = MAX(tbl_Pipeline_Run_Stats[Pipeline_Run_EndDate])
VAR _LatestRunEndDate =
    CALCULATE(
        MAX(tbl_Pipeline_Run_Stats[Pipeline_Run_EndDate]),
        ALLEXCEPT(tbl_Pipeline_Run_Stats, tbl_Pipeline_Run_Stats[Pipeline_Id])
    )
RETURN
    IF(_CurRunEndDate = _LatestRunEndDate, _LatestRunEndDate)

Latest Pipeline Run Status =
VAR _CurrentPipeline = MAX(tbl_Pipeline_Run_Stats[Pipeline_Id])
VAR _LatestRunEndDate =
    CALCULATE(
        MAX(tbl_Pipeline_Run_Stats[Pipeline_Run_EndDate]),
        ALLEXCEPT(tbl_Pipeline_Run_Stats, tbl_Pipeline_Run_Stats[Pipeline_Id])
    )
RETURN
    CALCULATE(
        MAX(tbl_Pipeline_Run_Stats[Pipeline_Run_Status]),
        tbl_Pipeline_Run_Stats[Pipeline_Id] = _CurrentPipeline &&
        tbl_Pipeline_Run_Stats[Pipeline_Run_EndDate] = _LatestRunEndDate
    )

Latest Data Load Status =
VAR _CurrentPipeline = MAX(tbl_Pipeline_Run_Stats[Pipeline_Id])
VAR _LatestRunEndDate =
    CALCULATE(
        MAX(tbl_Pipeline_Run_Stats[Pipeline_Run_EndDate]),
        ALLEXCEPT(tbl_Pipeline_Run_Stats, tbl_Pipeline_Run_Stats[Pipeline_Id])
    )
RETURN
    CALCULATE(
        MAX(tbl_Pipeline_Run_Stats[Data_Load_Status]),
        tbl_Pipeline_Run_Stats[Pipeline_Id] = _CurrentPipeline &&
        tbl_Pipeline_Run_Stats[Pipeline_Run_EndDate] = _LatestRunEndDate
    )

The following measure is based on the column Pipeline_Run_EndDate to identify the latest run Date of the Pipeline. It returns 1 for latest run date and 0 for others.
This measure logic and objective is similar to the calculated column Is_Latest_Pipeline_Run created in the beginning.
However, this will not carry forward in Drill through filters like Is_Latest_Pipeline_Run.

Pipeline Latest Run Flag =
VAR _CurRunEndDate = MAX(tbl_Pipeline_Run_Stats[Pipeline_Run_EndDate])
VAR _LatestRunEndDate =
    CALCULATE(
        MAX(tbl_Pipeline_Run_Stats[Pipeline_Run_EndDate]),
        ALLEXCEPT(tbl_Pipeline_Run_Stats, tbl_Pipeline_Run_Stats[Pipeline_Id])
    )
RETURN
    IF(_CurRunEndDate = _LatestRunEndDate, 1, 0)

Results:
Now we can generate the Pipeline Latest Run Summary table using the above measures as shown below:
You can apply required conditional formatting for the Latest Data Load Status if needed.


The Detailed table view of the Pipeline and Data Load Status details as per below:


Please Note:
To establish a relationship between the Pipeline Run Statistics table and the Calendar table based on date, please convert either Pipeline_Run_StartDate or Pipeline_Run_EndDate to Date format. 
Power BI does not support relationships between columns of different data types, specifically, between Date and DateTime columns.
--------------------------------------------------------------------------------------------------------
Drill through from Summary Page to Details Page:
Let's say, you have Summary table in the Power BI page "Pipeline_Run_Summary" and the details table view in the other Page "Pipeline_Run_Details".

Now if you want to Drill through from Summary page to Details page, then make sure there must be one common Column (Eg. Piplie_Id, Pipeline_Region) selected in both Visuals.

Next add the required common Fields in the Drill Through targe page (Pipeline_Run_Details), and add the additional columns if needed as shown below:


Next go back to the Summary page and try drill though from a Record from the Visual:


Now the result of the drill through in Details page is as shown below, will the all the Filters apply from Summary to Details page, except Latest Run Date filter is not passed:



Why Drill Through is not Filtering the Latest Dates in Detail view:
This issue arises as the measures (Eg. Pipeline Latest Run Date) used in the Summary view are calculating the values based on internal logic that determines the latest run date for each pipeline using DAX. 
  • Implicit filters inside measures (like MAX, CALCULATE, or ALLEXCEPT) are local to the measure and don’t become part of the drillthrough filter context.
  • Drillthrough only carries filters from explicit fields used in the visual or slicer like actual columns or selected values.
  • Measures don’t expose their internal logic to the drillthrough engine.
So, when you drill through from a visual that displays the latest run date, Power BI doesn’t automatically pass that date as a filter. It’s simply the output of a measure, not a selected or filtered field. To enable drill through filtering, you need to use actual columns or calculated columns that are part of the model and can participate in the filter context.

Enable Drill Through to pass Latest Date Filter from Summary to Detail view:
To enable filtering by the latest Pipeline Run Date during drill-through, we can apply a Visual Level or Page Level filter using the Is_Latest_Pipeline_Run column in the Summary View. This ensures that only the most recent run for each pipeline is passed to the details page during drill-through.

Output:
Now, the drill-through results on the Details page will correctly reflect the latest pipeline run date filter, as applied using the Is_Latest_Pipeline_Run column.



--------------------------------------------------------------------------------------------------------
Thanks
--------------------------------------------------------------------------------------------------------

No comments:

Post a Comment

Hi User, Thank You for visiting My Blog. If you wish, please share your genuine Feedback or comments only related to this Blog Posts. It is my humble request that, please do not post any Spam comments or Advertising kind of comments, which will be Ignored.

Featured Post from this Blog

How to compare Current Snapshot Data with Previous Snapshot in Power BI

How to Dynamically compare two Snapshots Data in Power BI Scenario: Suppose we have a sample Sales data, which is stored with Monthly Snapsh...

Popular Posts from this Blog