Friday, October 27, 2017

Incremental Load in SSIS to update data in SQL Server Destination Table from Excel Source

How to Update data in SQL Server Destination Table from Excel Source using SSIS Incremental Load
Incremental Load is process of comparing the Target data table against the Source Data based on a Id Column or Date Stamp or Time Stamp and updating or loading the data to Target table as per below conditions:
  • If there are any New records in Source data then we have to insert those records in Target data table.
     
    For example, on a Daily/Weekly basis we have to insert the Region wise sales data.
  • If there are any updated values in Source data then we have to update those records Target data table.
     
    For example, updating the Net Sales data which is adjusted.
Scenario :
Suppose we have a Destination/Target table which we need to update with Source data:
The Source data as follows :
Now if you observes the data in the Source against the Destination Table :
Incremental Load activities to perform here :
  1. First 6 records already exist in the Destination but the NetSales for records 2 and 4 ( Sales_Order_Id : 238372, 937123 ) are modified in Source which need to only Update in Destination Table.
  2. The Records 7,8,9,10 in the Source are not exist in the Destination which need to Insert in Destination Table.
  3. The records which does not have any change should not update to Destination, but you can route them to a Flat File if required.
Data Flow Task:
Data Flow Task Design:
Now we have to Design the above Data Flow task to perform the Incremental Load as follows :
Phase-1 : Create the Connections to the SQL Destination , Excel Source Data and a Flat File :

here , db_TAMATAM :  The SQL Connection to the Destination table "Tbl_Sales_Target"
SalesData_Source :  The Excel Connection to the Source data file "SalesData_Source.xlsx"
NoUpdateData :  The Flat File Connection to the File where we store the data which does not load to the SQL Destination, as they are already exist, and no updates in that.

Phase-2 : Connecting to the Excel Source:
First add the Data Flow Task to Control Flow 

Next add the Excel Source in side the data flow and Convert the Source data as per Destination Table

Phase-3 : Add the Lookup Transformation and set the properties:

Next add the Lookup Transformation to Data Flow and set the Properties
General Tab :
Choose Options for General Tab in Lookup Transformation Editor as follows.
Make sure to choose "Ignore Failure" to handle rows with not matching entries.

Connection Tab :
Write an SQL Query to Select your desired Columns from Destination table which we further use for the Lookup by Mapping with Source Columns 


Columns Tab :
Here we are mapping the Lookup Column "Sales_Order_Id"(from Destination) with C_Order_Id(from Source ).

The other column "NetSales" is not using for Lookup, but I selected that to use further in Conditional Split in next step.


Phase-4 : Add the Condition Split Transformation and set the properties:

Conditions:
Insert_New : ISNULL(LKP_Sales_Order_Id)
This condition checks whether LKP_Sales_Order_Id is NULL or not. If its true then those new records will "Insert" into the Destination table.
Note :
From the above step Lookup Transformation will returns null when there is no matching "C_Order_Id" in the Source, when comparing with Destination column "Sales_Order_Id".

Update_Existing : C_Order_Id == LKP_Sales_Order_Id && C_NetSales != LKP_NetSales
This condition checks whether the Sales Order exist in both the tables, and also checks whether there is any change/adjustment in the measure NetSales.

If it is true then those modified records will "Update" into Destination Table.

Default_OP : This is the default output which does not met any the above to conditions.

Phase-5 : Connect the Conditional Split Data to Destinations:
From the above Conditional Split Transformation, data will be transferred to the Destinations,based on condtions as follows :

OLE DB Destination:  The records which are new in the Source will be Inserted to the Destination Table "Tbl_Sales_Target".

OLE DB Command :  The records which have the modified "NetSales" in the Source will be Updated to the Destination Table "Tbl_Sales_Target", defined in the OLE DB Transforation with Parameter mapping.

OLE DB Command Transformation :
We have to set the OLE DB Command Transformation properties as follows :

Connection Managers Tab :
Select the DB Connection
Component Properties Tab :
Write an Update statement to Update the Destination table columns with Source columns based on Parameter(?) mapping done in next step.

UPDATE [dbo].[Tbl_Sales_Target]
 SET  [Sales_Order_Id]=?
      ,[Sales_Region]=?
      ,[Sales_Period]=?
      ,[Cust_Segment]=?
      ,[Prd_Svc_Flag]=?
      ,[NetSales]=?
WHERE  [Sales_Order_Id]=?

Column Mappings Tab :
Map the Input Source Columns with Parameters created in above Step.
Now the OLE DB Transformation is ready to use.
Flat File Destination :  The records which are Neither New Nor Modified will be moved to this Flat File.

Final Phase :  Execute the Package
---------------------------------------------------------------------------------------------------------------
Final Output :
Before Update :

After Update :

--------------------------------------------------------------------------------------------------------
Thanks, TAMATAM ; Business Intelligence & Analytics Professional
--------------------------------------------------------------------------------------------------------

Sunday, October 22, 2017

SSIS ForEach Loop to load Data from specific Sheets of an Excel Workbook to different SQL Server Tables

SSIS For Each Loop Container to Loop through specific Sheets of an Excel and load data to different SQL Server Destinations
Scenario :
Suppose we have an Excel Workbook(xlsx,2016 version) with multiple Sheets having the Sales data by Product and Service, as follows :

Source:
T:\My_Tech_Lab\SSIS_Lab\SSIS_Proj1\Source

From this Workbook, we needs to Load the data only from the Tabs "Product" , "Service" ; and other tabs need to Skip.

Destination:
The Data from the above Source tabs, we need to load to below SQL Server Destination tables :
We need to make sure that data from tab "Product$" should go to the table "Tbl_SalesByProd" and data from tab "Service$" should go to the table "Tbl_SalesBySvc"



--------------------------------------------------------------------------------------------------------------
SSIS Package Design :

Now we need to design the package as above based on scenario as follows :

Phase- 1 : Creation of the Variables Needed :
Create a Variable to store the Sheet Names of the Workbook and assign one existing Sheet Name as a Value. This value will change in the run time.

Phase- 2 : Creation of the Connections to Database, Excel Source, and an ADO.Net OLE DB Connection for Excel:

here, db_TAMATAM - The connection to the SQL Server DB.
SalesData.xlsx - The connection Excel Source data.

OleDb_SalesData.xlsx :
This is a ADO.Net OleDb Connection to the Excel Source to read the Tabs/Sheets of Workbook
This connection is very important to loop through each Sheet/Tab of the Workbook by ForEach Loop using the "Foreach ADO.NET Schema Rowset Enumerator"

This ADO.NET connection can be created as follows:
Select New ADO.NET Connection from Connection Mangers block
Next select the ".Net Prodividers for OleDB\Microsoft Office 12.0 Access Database Engine OLE DB Provider"


Next Select the Excel Source File 

Next set the Extended Properties of the OLEDB Connection String as "Excel 12.0 XML;HDR=YES"

HDR=YES means the file has Headers.

Note :
If you are connecting to Microsoft Office Excel Data, add the appropriate Extended Properties of the OLEDB Connection String based on the Excel File type, as follows :


Phase-3 : Add the ForEach Loop Container to Control Flow and Set the Properties:
Enumerator Foreach ADO.NET Schema Rowset Enumerator
Connection : The ADO.NET Connetion(OleDb_SalesData.xlsx)
Schema : Tables


Next do the Variable Mapping "v_SheetName" to Store the Sheet Name dynamically.
Variable :  User:: v_SheetName
Index : 2
Make sure to select Index as "2" to read the Tabs/Sheets of Excel by ForEach Loop.


Phase- 4 : Add the Expression Task to validate the Sheets names which passed by the ForEach Loop:
Expression : 
@[User::v_SheetName]=="Product$"||@[User::v_SheetName]=="Service$"

Add the same Expression to the Precedence Constraint as well.
Expression : 
@[User::v_SheetName]=="Product$"||@[User::v_SheetName]=="Service$"

Evaluation Expression : Expression and Constraint

Phase- 5 : Add the Data Flow Task and define the flow:
Excel Source :
Add the Excel Source and set the properties as follows:
Data Access Mode : Table Name or View Name Variable
Variable Name : User::v_SheetName

This Variable value is Passed by the ForEach Loop to the Data Flow > Excel Source


Next Add the Data Conversion if needed to Convert the Source Data

Conditional Split :
Next Add the Conditional Split to route the Source Data to the different destinations. 

Ouput Name :                Conditon :
Product_Data                 @[User::v_SheetName] == "Product$"
Service_Data                 @[User::v_SheetName] == "Service$"

The data from Sheet "Product$" will go to the table "Tbl_SalesByProd" and the data from Sheet "Service$" will go to the table "Tbl_SalesBySvc".

Final Data Flow :

-------------------------------------------------------------------------------------------------------- 
Thanks, TAMATAM ; Business Intelligence & Analytics Professional 
--------------------------------------------------------------------------------------------------------

Saturday, October 21, 2017

SSIS For Loop and ForEach Loop to load multiple Flat Files data from different Folders to SQL Server

SSIS For and ForEach Loop Containers to Loop through multiple Text Files from different Folders and load data to SQL Server Destinations
Scenario :
Suppose we have two Folders having multiple input files(.txt type), one folder having the Product data and another Folder having the Service data as follows :


The different sets of input Files in each folder as follows :

From these input files we need to load only the files having naming like "Product-*.txt" and "Service_*.txt" which we need to Load to the respective SQL Server Destinations .
The Destination Tables :

--------------------------------------------------------------------------------------------------------------
SSIS Package Design :
Now we need to design the package as above based on above scenario as follows :
Phase-1 : Creation of the Variables Needed :
We need to create the below variables with direct value declaration as follows...
IncrementValue - This is a increment variable use in For Loop to store the count of Folders Looped. Initial value of this variable is "1".
TotalCount - This variable use in For Loop which represents the Total No.of Folders need to Loop. The value of this variable is "2" as we need to Loop through 2 Folders in our Scenario.
SrcFolderOne - The Path of the Folder which we Loop First.
SrcFileOne - The files having name like "Product_*.txt" which need to process by For Each Loop from ScrFolderOne.

SrcFolderTwo - The Path of the Folder which we Loop Second.
SrcFileTwo - The files having name like "Service_*.txt" which need to process by For Each Loop from ScrFolderTwo.
SrcFileName - This is a Run Time variable which we use in the For Each loop to store the selected input file name
Also need to Create two more Run Time Variables(Collection Variables) with expressions to capture the Folder and File Names selected by For Loop which passing in to For Each Loop  :
SelectFolderPath = @[User::IncrementValue]==1? @[User::SrcFolderOne]: @[User::SrcFolderTwo]
SelectFileName = @[User::IncrementValue]==1? @[User::SrcFileOne] : @[User::SrcFileTwo]





Phase-2 : Create the OLE DB Destination and Flat File Source Connections:
db_Tamatam - The connection to the Database.
Src_FlatFiles - The connection to the Flat file source .

As we dynamically need to loop through the source Flat files, we need to make the Connection String dynamic by using expression as follows :
ConnectionString= @[User::SelectFolderPath]+ @[User::SrcFileName]

here, The variable @[User::SelectFolderPath] is the selected folder path, whose value passed by For Loop.
The variable @[User::SrcFileName] is the selected file name, whose value passed by For Each Loop.

Phase-3 : Add the Execute SQL Task to Truncate your destination table before loading the data :
TRUNCATE Table [Tbl_SalesByProd]
TRUNCATE Table [Tbl_SalesBySvc]


Phase-4 : Add the For Loop Container to the Control Flow and setup Properties :
InitExpression : @IncrementValue=1
EvalExpression : @IncrementValue<=@TotalCount
AssignExpression : @IncrementValue=@IncrementValue+1


Phase-5 : Add the For Each Loop Container inside the For Loop Container in Control Flow and setup Properties :


In  "Collection" tab update the Collection Variables using the Expressions :
Directory : @[User::SelectFolderPath]
FileSpec : @[User::SelectFileName]
The values for these variables will be passed by For Loop, now the details will be reflect in For Each Loop as follows :

In "Variable Mappings" tab update variable name to store the File selected by the ForEach Loop;
Variable :: SrcFileName

Note:
The For Each Loop will take the @[User::SelectFileName] as "Product_*.txt from For Loop then it looks for a file like that in the @[User::SelectFolderPath] and selects the file.

Suppose if it first selects the File "Product_ABC123" from the source folder one, then that full file name will stored the variable SrcFileName.

Phase-6: Add the Data Flow task inside For Each Loop Container and define the flow :
Data Flow :
The Data Flow task will loads the Product and Service data from Source Flat files to SQL Server Destination tables based on the Conditional Split :
Conditional Split :
If @[User::IncrementValue] == 1 then loads the Data to Product table else loads the data to the default output which Service table.





--------------------------------------------------------------------------------------------------------
Thanks, TAMATAM ; Business Intelligence & Analytics Professional
--------------------------------------------------------------------------------------------------------

Featured Post from this Blog

How to compare Current Snapshot Data with Previous Snapshot in Power BI

How to Dynamically compare two Snapshots Data in Power BI Scenario: Suppose, we have a sample Sales data, which is stored with Monthly Snaps...

Popular Posts from this Blog