Wednesday, March 5, 2025

How to generate Random Sample Dimension and Fact Tables using Power Query

How to generate Random Sample Dimension and Fact Tables using M-Query
Scenario:
Suppose if we want to generate the Dimension and Tables with Random Sample data using Power Query.

1) dim_Date - a sample Calendar dimension with list of Dates for 2021.
2) dim_Customer - a customer dimension with sample list of Customers.
3) dim_Products - a products dimension with sample list of Products
4) fact_Sales_Orders - a fact table with random sample of Order details data.

We can create those sample tables using the following Power Query logics.

1) dim_Date: The sample Calendar dimension with list of Dates for 2021.

let
    /* Generate a list of dates for the entire year 2021 */
    StartDate = #date(2021, 1, 1),
    EndDate = #date(2021, 12, 31),
    DateList = List.Dates(StartDate, Duration.Days(EndDate - StartDate) + 1, #duration(1, 0, 0, 0)),

    /* Convert the list of dates into a table */
    DateTable = Table.FromList(DateList, Splitter.SplitByNothing(), {"Date"}),

    /* Add additional columns for required attributes */
    ExtendedTable = Table.AddColumn(DateTable, "Year", each Date.Year([Date]), Int64.Type),
    AddMonth = Table.AddColumn(ExtendedTable, "Month", each Text.Proper(Text.Start(Date.MonthName([Date]), 3)), type text),
    AddDay = Table.AddColumn(AddMonth, "Day", each Date.Day([Date]), Int64.Type),
    AddMonthYear = Table.AddColumn(AddDay, "Month_Year", each [Month] & "-" & Text.From(Date.Year([Date])), type text),
    AddQuarter = Table.AddColumn(AddMonthYear, "Quarter", each "Q" & Text.From(Date.QuarterOfYear([Date])), type text),
    AddQtrYear = Table.AddColumn(AddQuarter, "Qtr_Year", each [Quarter] & "-" & Text.From(Date.Year([Date])), type text),

    /* Change data types as necessary */
    FinalTable = Table.TransformColumnTypes(AddQtrYear, { {"Date", type date}, {"Year", Int64.Type}, {"Month", type text}, {"Day", Int64.Type}, {"Month_Year", type text}, {"Quarter", type text}, {"Qtr_Year", type text} })
in
    FinalTable

Result:


2) dim_Customer: The customer dimension with sample list of Customers.

let
    CustomerNames = {"Sun Hot", "Moon Cool", "Water Flow", "Sky Star", "Earth Strong", "Fire Blaze", "Ocean Deep", "Cloud Soft", "Wind Swift", "Mountain High"},
    Regions = {"North", "South", "East", "West"},
    Source = Table.FromRecords(
        List.Transform({1..10}, each [
            Customer_ID = Text.Combine({"C", Number.ToText(_, "00")}),
            /* Picks a name from the predefined list */
            Customer_Name = CustomerNames{_ - 1}, 
            /* Cycles through regions */
            Customer_Region = Regions{Number.Mod(_, List.Count(Regions))} 
        ])
    ),
    DataTypeChange = Table.TransformColumnTypes(Source, { {"Customer_ID", type text}, {"Customer_Name", type text}, {"Customer_Region", type text} })
in
DataTypeChange

Result:

3) dim_Products: 
The
products dimension with sample list of Products

let
    Products = {
        [Product_ID="P01", Prod_Name="LED TV", Prod_Category="Electronics", Prod_Price=500, Prod_Cost=300],
        [Product_ID="P02", Prod_Name="Smartphone", Prod_Category="Electronics", Prod_Price=800, Prod_Cost=500],
        [Product_ID="P03", Prod_Name="Laptop", Prod_Category="Electronics", Prod_Price=1200, Prod_Cost=900],
        [Product_ID="P04", Prod_Name="Refrigerator", Prod_Category="Home Appliances", Prod_Price=1500, Prod_Cost=1000],
        [Product_ID="P05", Prod_Name="Washing Machine", Prod_Category="Home Appliances", Prod_Price=1800, Prod_Cost=1400],
        [Product_ID="P06", Prod_Name="Microwave Oven", Prod_Category="Home Appliances", Prod_Price=600, Prod_Cost=400],
        [Product_ID="P07", Prod_Name="Sofa", Prod_Category="Furniture", Prod_Price=2000, Prod_Cost=1500],
        [Product_ID="P08", Prod_Name="Wooden Table", Prod_Category="Furniture", Prod_Price=900, Prod_Cost=600],
        [Product_ID="P09", Prod_Name="Wooden Chair", Prod_Category="Furniture", Prod_Price=2500, Prod_Cost=1900]
    },
    Source = Table.FromList(Products, Record.FieldValues, {"Product_ID", "Product_Name", "Category", "Price", "Cost"})
in
    Source

Result:

4) fact_Sales_Orders: The fact table with random sample of Order details data.

let
    SalesRecords = List.Transform({1..30}, each 
    let
        /* Generate a Range of Random Dates */
        DateInfo
            let
                StartDate = #date(2021, 1, 1),
                EndDate = #date(2021, 3, 31),
                RandomDaysOffset = Number.RoundDown(
                    Number.RandomBetween(0, Duration.Days(EndDate - StartDate))),
                RandomDate = Date.AddDays(StartDate, RandomDaysOffset),
                Date_ID = Number.FromText(Date.ToText(RandomDate, "yyyyMMdd")),
                Month = Date.Month(RandomDate)
            in
                [Order_Date = RandomDate, Order_Month = Month],
        
        /* Random Product selection */
        RandomProduct = Number.RandomBetween(1, 9)
    in
        [
        Order_ID = _,
        Product_ID = 
        /* 
        Exclude specific products for certain months:
        - For January (Order_Month = 1), exclude products P01, P02, and P03.
        - For February (Order_Month = 2), exclude products P04, P05, and P06.
        - For March (Order_Month = 3), exclude products P07, P08, and P09.
        If the RandomProduct falls within the exclusion list for the respective month,
        then the Product_ID is set to null (excluded from the data). 
        Otherwise, it assigns the product.
        */
        if                 
            (DateInfo[Order_Month] = 1 and List.Contains({1, 2, 3}, RandomProduct)) or
            (DateInfo[Order_Month] = 2 and List.Contains({4, 5, 6}, RandomProduct)) or
            (DateInfo[Order_Month] = 3 and List.Contains({7, 8, 9}, RandomProduct))
        then null
        else Text.Combine({"P", Number.ToText(RandomProduct, "00")}),
        Customer_ID = Text.Combine({"C", Number.ToText(Number.RandomBetween(1, 10), "00")}),
        
        Order_Date = DateInfo[Order_Date],
        Quantity_Sold = Number.RandomBetween(1, 5),
        Revenue = Number.RandomBetween(500, 3000),
        Profit = Number.RandomBetween(50, 500)
        ]
    ),
    /* Remove records where Product_ID is null */
    FilteredRecords = Table.SelectRows(Table.FromRecords(SalesRecords), each [Product_ID] <> null),
    /* Adjust column data types */
    DataTypeChange = Table.TransformColumnTypes(FilteredRecords,{{"Order_ID", Int64.Type}, {"Product_ID", type text}, {"Customer_ID", type text}, {"Order_Date", type date}, {"Quantity_Sold", Int64.Type}, {"Revenue", type number}, {"Profit", type number}})
in
    DataTypeChange

Result:
Please note that, the random values will keep change on each refresh of the above Queries.


--------------------------------------------------------------------------------------------------------
Thanks, TAMATAM ; Business Intelligence & Analytics Professional
--------------------------------------------------------------------------------------------------------

No comments:

Post a Comment

Hi User, Thank You for visiting My Blog. If you wish, please share your genuine Feedback or comments only related to this Blog Posts. It is my humble request that, please do not post any Spam comments or Advertising kind of comments, which will be Ignored.

Featured Post from this Blog

How to compare Current Snapshot Data with Previous Snapshot in Power BI

How to Dynamically compare two Snapshots Data in Power BI Scenario: Suppose we have a sample Sales data, which is stored with Monthly Snapsh...

Popular Posts from this Blog