Scenario:
While this design works well for transactional systems, it becomes difficult when you need to analyze or report across all of them in a consistent way.
For example, management may want to see how a drug performed in trials, how efficiently it was produced, and how much revenue it generated, in a consolidated view.
The challenge is to create a Unified_Data_Repository that standardizes data from these diverse sources. Instead of wide tables with many columns, the goal is to reshape the data into a normalized tall format.
Suppose you are working in a pharmaceutical company where information is distributed across several systems. Each system captures different attributes:
- Clinical trials record patient groups, their clinical responses, and efficiency rates.
- Production records track manufacturing details such as batch yield ratios and batch quantities.
- Sales transactions capture commercial data like regions, customer types, and sales amounts.
For example, management may want to see how a drug performed in trials, how efficiently it was produced, and how much revenue it generated, in a consolidated view.
The challenge is to create a Unified_Data_Repository that standardizes data from these diverse sources. Instead of wide tables with many columns, the goal is to reshape the data into a normalized tall format.
In this structure, each column value is transformed into a row, making it easier to query, compare, and report across the various heterogeneous tables.
By using the consolidated dataset and pivoting attributes like Drug_Code, Efficiency_Rate, and Sales_Amount, we can understand how clinical effectiveness relates to the performance of the sales. From this analysis, we can derive the following key insights.
The consolidated Output table follows a simple schema as per below:
- ID: represents the original row identifier from the source table.
- Criteria_Name: the name of the column being transformed.
- Criteria_Value: the actual value from that column, stored as text.
- Src_Object_Name: the source table name, ensuring traceability.
- _LastModified_On: a timestamp marking when the record was inserted.
We can achieve this Scenario using Dynamic SQL with Cursor as explained below:
-----------------------------------------------------------------
Step1: Create Sample Input Source Tables:
-----------------------------------------------------------------
-- Sample of Clinical trial data
CREATE TABLE [dbo].[Clinical_Trials]
(
Trial_ID INT,
Drug_Code VARCHAR(50),
Patient_Group VARCHAR(50),
Clinical_Response VARCHAR(100),
Efficiency_Rate DECIMAL(10,2)
);
INSERT INTO [dbo].[Clinical_Trials] VALUES
(1001, 'Drug_A01', 'Adults', 'Positive', 87.5),
(1002, 'Drug_B02', 'Children', 'Neutral', 65.0),
(1003, 'Drug_C03', 'Elderly', 'Negative', 42.3);
SELECT * FROM [dbo].[Clinical_Trials]
-- Sample of Production records
CREATE TABLE [dbo].[Production_Records]
(
Batch_ID INT,
Drug_Code VARCHAR(50),
Plant_ID VARCHAR(50),
Batch_Yield_Ratio DECIMAL(10,2),
Batch_Quantity DECIMAL(10,2)
);
INSERT INTO [dbo].[Production_Records] VALUES
(2001, 'Drug_A01', 'Plant_001', 95.2, 1200.5),
(2002, 'Drug_B02', 'Plant_002', 89.7, 980.0),
(2003, 'Drug_C03', 'Plant_003', 76.4, 750.2);
SELECT * FROM [dbo].[Production_Records]
CREATE TABLE [dbo].[Sales_Transactions]
(
Sale_ID INT,
Region VARCHAR(50),
Drug_Code VARCHAR(50),
Customer_Type VARCHAR(50),
Sales_Amount DECIMAL(10,2)
);
INSERT INTO [dbo].[Sales_Transactions] VALUES
(3001, 'North America', 'Drug_A01', 'Hospital', 15000.0),
(3002, 'Europe', 'Drug_B02', 'Pharmacy', 12000.5),
(3003, 'Asia', 'Drug_C03', 'Distributor', 9800.0);
SELECT * FROM [dbo].[Sales_Transactions]
Step 2: Create the Unified Output Table to store final Output Data:
-----------------------------------------------------------------
CREATE TABLE [dbo].[Unified_Data_Repository]
(
ID INT,
Criteria_Name VARCHAR(200),
Criteria_Value VARCHAR(200),
Src_Object_Name VARCHAR(200),
_LastModified_On DATETIME
);
-----------------------------------------------------------------
Step 3: Create a Temp Table with Source Table Names and Key Columns:
-----------------------------------------------------------------
IF OBJECT_ID('tempdb..##Src_Table_Mapping') IS NOT NULL
DROP TABLE ##Src_Table_Mapping;
CREATE TABLE ##Src_Table_Mapping (
Source_Table VARCHAR(128),
Key_Column VARCHAR(128)
);
INSERT INTO ##Src_Table_Mapping VALUES
('Clinical_Trials', 'Trial_ID'),
('Production_Records', 'Batch_ID'),
('Sales_Transactions', 'Sale_ID');
SELECT * FROM ##Src_Table_Mapping;
-----------------------------------------------------------------
Step 4: Dynamic SQL Query with Cursor to consolidate Data from all Source tables
-----------------------------------------------------------------
-- Declare variables to hold table name and SQL statement
DECLARE @TableName NVARCHAR(200);
DECLARE @sql NVARCHAR(MAX);
DECLARE @KeyColumn NVARCHAR(200);
-- Define a Cursor that will loop through all Source table names
DECLARE _CursorX Cursor FOR
SELECT Source_Table, Key_Column FROM ##Src_Table_Mapping;
-- Open the Cursor (initializes the result set)
OPEN _CursorX;
-- Fetch the first record (first table name from mapping)
FETCH NEXT FROM _CursorX INTO @TableName, @KeyColumn;
-- Loop until all records are processed
WHILE @@FETCH_STATUS = 0
BEGIN
SET @sql = '';
-- Build dynamic SQL for the _CursorXrent table
-- This Pivots columns into rows and inserts them into Unified_Data_Repository
SELECT @sql =
'INSERT INTO [dbo].[Unified_Data_Repository]
(ID, Criteria_Name, Criteria_Value, Src_Object_Name, _LastModified_On) ' +
(
SELECT STRING_AGG(
'SELECT ' + @KeyColumn + ' AS ID, ''' + COLUMN_NAME + ''' AS Criteria_Name, ' +
'CAST(' + COLUMN_NAME + ' AS NVARCHAR(200)) AS Criteria_Value, ' +
'''' + @TableName + ''' AS Src_Object_Name, GETDATE() AS _LastModified_On ' +
'FROM [dbo].[' + @TableName + ']',
' UNION ALL '
)
FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_NAME = @TableName
AND COLUMN_NAME <> @KeyColumn -- Exclude Key column from Pivot
);
-- Execute the dynamic SQL for this table
EXEC(@sql);
-- Fetch the next record (next table name)
FETCH NEXT FROM _CursorX INTO @TableName, @KeyColumn;
END
-- Close the Cursor (release resources)
CLOSE _CursorX;
-- Deallocate the Cursor (remove definition)
DEALLOCATE _CursorX;
-----------------------------------------------------------------
Step 5: View the final consolidated Output
-----------------------------------------------------------------
SELECT *
FROM [dbo].[Unified_Data_Repository]
ORDER BY Src_Object_Name, ID;
Data Analysis and Insights:
-----------------------------------------------------------------
We can use the consolidated unified dataset to analyze and study the relationship between clinical outcomes and commercial performance.
By pivoting attributes such as Drug_Code, Efficiency_Rate, and Sales_Amount, the dataset can be transformed into an analytical view that highlights how scientific effectiveness connects with market adoption.
The example below demonstrates how a focused query can turn the normalized repository into actionable insights:
WITH Drug_Data AS (
SELECT Src_Object_Name, Criteria_Name, Criteria_Value, ID
FROM Unified_Data_Repository
WHERE Criteria_Name IN ('Drug_Code','Efficiency_Rate','Sales_Amount')
)
SELECT t1.Criteria_Value AS Drug_Code,
MAX(CASE WHEN t2.Criteria_Name = 'Efficiency_Rate' THEN t2.Criteria_Value END) AS Efficiency_Rate,
MAX(CASE WHEN t2.Criteria_Name = 'Sales_Amount' THEN t2.Criteria_Value END) AS Sales_Amount
FROM Drug_Data t1
INNER JOIN Drug_Data t2 ON t1.ID = t2.ID
WHERE t1.Criteria_Name = 'Drug_Code'
GROUP BY t1.Criteria_Value;
1) Clinical vs. Commercial Performance:
- Drug_A01 shows the highest efficiency rate (87.5%) and the strongest sales (15,000). This indicates strong clinical success translating into strong market adoption.
- Drug_C03 has the lowest efficiency (42.3%) and the weakest sales (9,800), suggesting limited clinical effectiveness is reflected in lower demand.
2) Correlation Between Metrics:
- Drug_A01 shows a positive correlation: higher efficiency rates tend to align with higher sales amounts.
- Drug_B02, with moderate efficiency (65%) but relatively strong sales (12,000.5), shows that commercial performance can sometimes outpace clinical effectiveness, possibly due to market factors like pricing, distribution, or brand strength.
3) Strategic Decision Support:
- Drug_B02 may need further clinical improvement to match its commercial success, ensuring long‑term sustainability and patient satisfaction.
- Drug_C03 could require both clinical enhancements and stronger commercial strategies to improve its overall performance.
Note:
This article shows how Dynamic SQL with Cursors can be used to consolidate data. The examples are for learning purposes, and you can refine and adapt the logic to fit your own requirements.
--------------------------------------------------------------------------------------------------------
Thanks, TAMATAM ; Business Intelligence & Analytics Professional
--------------------------------------------------------------------------------------------------------
Thanks, TAMATAM ; Business Intelligence & Analytics Professional
--------------------------------------------------------------------------------------------------------
No comments:
Post a Comment
Hi User, Thank You for visiting My Blog. If you wish, please share your genuine Feedback or comments only related to this Blog Posts. It is my humble request that, please do not post any Spam comments or Advertising kind of comments, which will be Ignored.