Thursday, December 7, 2017

How to Load Data from New Files and Reject the already Processed Files in SSIS

SSIS Package to Load Data from New Files and Reject the already Loaded Files
Scenario :
Suppose, We have a Source Folder where client will update the Input Files on a Timely basis, which we need to load to SQL Server table " [dbo].[Tbl_Sales]" and then we need to Archive/Move them to a Processed Files Folder.

Some times there are the chances that the Client may Copy the Duplicate/Same files to the folder along with New Files.
In this Case we need to process only the New Files and we have to Reject already Processed/Loaded Files and then Archive/Move them to Rejected Files Folder.
Once a File is Successfully Processed, that File details will be Saved to a Table Called "[dbo].[File_Load_Status]" . Whenever we are Processing again a another File, First we will Check that File Name against the File Name already processed Stored in this Table. If both are Same then we Reject this File else It will be Processed.


Now Design the Package to fulfill this Scenario.

----------------------------------------------------------
Phase-1 : Create the Required Variables at Package Level
----------------------------------------------------------
InputFolder : It is a Static variable to which we will assign the Folder path where we have the Input Files to Process.
T:\My_Tech_Lab\SSIS_Lab\SSIS_Proj1_Sol1\Source\

FileName : It is a Dynamic variable in which we store the Input File Name during runtime by ForEach Loop Container, which loops through Input Files from Input Folder.

Initially assign some values as "SalesRegion.txt" which change in Runtime.

ProcessedFolder : It is a Static variable to which we will assign the Folder Path where the Processed Files will be Archived.

RejectedFolder :  It is a Static variable to which we will assign the Folder Path where the Rejected Files will be Archived.


FileFullPath : It is a Dynamic Variable which will be calculated during the Runtime using the below expression. This is the Fully Qualified Path of the Input File. This Variable will be used to make set the Dynamic Connection String for the Source File.
@[User::InputFolder]+ @[User::FileName]

LoadStatusFlag : This is an Int Variable where we Store the Boolean Value 1 or 0. This Variable will tells whether the Input File is already Processed or not.
1- Means Processed ; 0- Means not Processed.


----------------------------------------------------------
Phase-2 : Add the For Each Loop Container to Control Flow and Set the Properties
----------------------------------------------------------
A) Add the For Each Loop Container to Data Flow


B)Next Go To For Each Loop Editor > Collection Tab; Map the Collection Variable
Enumerator = For Each File Enumerator
Expressions > Directory = @[User::InputFolder] ; The Collection Variable is the Input Folder from which Files need to read by ForEach Loop Container.
Retrieve File Name = Name and Extension
File=*.txt  ; We are Loading data from only the Text Files


C)Next Go To For Each Loop Editor > Variable Mappings Tab ; Do the Variable Mappings.
During the Runtime, the File Name will be stored in the Variable = User::FileName

Next Say OK. ForEach Loop Container is Ready.
----------------------------------------------------------
Phase-3 : Add the Execute SQL Task Inside the For Each Loop Container and Set the Properties to Validate the File Process Status
----------------------------------------------------------
A) Add the Execute SQL Task Inside the Control Flow


B) Go to Execute SQL Task>General Tab, Set the Properties
Set the Connection to your Load_File_Status Table which will have the details of Processed files. If it is First time process, No records will available in that Table.
SQL Statement
Declare @FileName Varchar(255)
Set @FileName=?
IF Exists ( Select 1 From [dbo].[File_Load_Status]
Where File_Name=@FileName )
Begin
Select 1 As FileExistsFlag
End
ELSE
Begin
Select 0 As FileExistsFlag
End

This statement will return "1" if the Current File Name is exists in the [File_Load_Status] table, otherwise it will return "0".

In Case of "1" : We reject the Input File to Process.
In Case of "0" : We will Process/Load the Data from that Input File.
? = It is the Parameter into which we will Pass the Variable "FileName" from ForEach Loop.
Result Set = Single Row ;  Since we are Passing single row value (0 or 1) from above SQL Statement.


C) Go to Execute SQL Task>Parameter Mapping ; Do the Variable Mapping
Here we are Mapping the Variable "FileName" to the Parameter 0 (?)


D) Go to Execute SQL Task>Result Set; Do the Variable Mapping
Here we Store the above SQL Result ("1" or "0") to an Int Variable = User::LoadStatusFlag

Next Say OK , The Execute SQL Task File Process Status Validation is Ready.
----------------------------------------------------------
Phase-4 : Add the Data Flow Task , File System Task Inside the For Each Loop Container and Set the Expressions to decide which one to Execute
----------------------------------------------------------

Add the Data Flow Task and File System Tasks and Connect them Execute SQL Task.

Next define the Expressions in Precedence Constraints of Each task as explained below.



Constraint Expression for Data Flow Task(DFT_LoadSalesData) :


@[User::LoadStatusFlag]==0
When LoadStatusFlag=0 We will Process the File and Load the Data to Destination.
Constraint Expression for File System Task(Move_Already_Processed Files_to_Rejected Folder) :

@[User::LoadStatusFlag]==1
When LoadStatusFlag=1 , We will Reject that File and Move it to Rejected Files Folder.

Now, First I will Explain the File System Task after that I will explain the Data Flow.

1) File System Task(Move_Already_Processed Files_to_Rejected Folder) :
When the @[User::LoadStatusFlag]==1 then we should Reject that File and Move/Archived to a Rejected Files Folder as follows..
DestinationVariable=User::RejectedFolder
Operation = Move File
SourceVariable=User::FileFullPath


Next Say OK. The File System Task is ready to Move already Processed Files to a Rejected Files Folder
2) Data Flow Task(DFT_LoadSalesData) :
If the Input File is new then We need to Process the File, Load the Data to Destination and then Move the File to a Processed Files Folder.
Now Design the Data Flow to do these activities..
A) Create the Connections to Input Source and Destination Table

B) Connect Source and Destination in Data Flow and do Column Mappings and Data Conversions If required.
Here I am not going to explain the basic things again.


Connect the Flat File Source to any one of the Source File from Input Folder which will change in runtime.
Eg : "T:\My_Tech_Lab\SSIS_Lab\SSIS_Proj1_Sol1\Source\
SalesByRegion_AS_12062017.txt"
Connect the OLE DB Destination to the Target Table " [MyAnalyticsDB].[dbo].[File_Load_Status]"

C) Make the Dynamic Source Connection String using Expressions
Go to SalesByRegion_Source > Properties > Expressions
Connection String = @[User::FileFullPath]


Next Say OK. Now the Data Flow is ready to Load the Data.
----------------------------------------------------------
Phase-5 : Add the File System Task Inside the For Each Loop Container and connect to Data Flow Task and Set the Properties to Move the Processed Files
----------------------------------------------------------
A) Add the File System Task and Connect to Data Flow




B) Go to File System Task Editor > General Tab ; Set the Variable Mappings
DestinationVariable=User::ProcessedFolder
Operation = Move File
SourceVariable=User::FileFullPath


Next Say OK. The File System Task ready for Moving/Archiving the Process Files.
----------------------------------------------------------
Final Phase : Add the Execute SQL Task and Connect to the above File System Task to Load the Processed File Details to File Load Status Table
----------------------------------------------------------
A) Add the Execute SQL Task and Connect to the above File System Task


B) Go to the Execute SQL Task Editor > General Tab ; Set the Properties
Set the Connection to the Table "[dbo].[File_Load_Status]" 
SQL Statement : 
Insert Into [dbo].[File_Load_Status] (File_Name) Values (?)
The Value for the Parameter(?) is Passing through FileName Variable "User::FileName"

C) Go to the Execute SQL Task Editor > Parameter Mappings Tab ; Set the Variable Mappings.

Here we map the FileName Variable "User::FileName"  for the Parameter(?) .

Next Say OK. The Execute SQL Task is ready.
Now the Entire Package is Ready

----------------------------------------------------------
Now we need to Execute and Test the Package and Results.
Test 1 :
I have kept Two Files in the Input Source Files Folder and Run the Package

Output 1 :
Now the Output in ProcessedFolder , RejectedFolder and [dbo].[File_Load_Status] as follows..
Processed Folder :
Two files are Processed/Loaded to Destination and then Moved/Archived to the Processed Files Folder as follows...


[File_Load_Status] Table :
Two Processed File details are loaded to this Table.
Select* From [File_Load_Status]

RejectedFolder :
In this Case no Files have been Moved to this Folder.


----------------------------------------------------------
Test 2 :
Now again I have kept Two already Processed Files and Two New Files in the Input Source Files Folder and Run the Package.

Output 2 :
Now the Output in ProcessedFolder , RejectedFolder and [dbo].[File_Load_Status] as follows..
Processed Folder :
Now Two more files are Processed/Loaded to Destination and then Moved/Archived to the Processed Files Folder as follows...


[File_Load_Status] Table :Two more Processed Files details are loaded to this Table.
Select* From [File_Load_Status]


RejectedFolder :
In this Case Two Files should be Moved to this Folder.



--------------------------------------------------------------------------------------------------------
Thanks, TAMATAM ; Business Intelligence & Analytics Professional
--------------------------------------------------------------------------------------------------------

No comments:

Post a Comment

Hi User, Thank You for visiting My Blog. Please post your genuine Feedback or comments only related to this Blog Posts. Please do not post any Spam comments or Advertising kind of comments which will be Ignored.

Featured Post from this Blog

How to compare Current Snapshot Data with Previous Snapshot in Power BI

How to Dynamically compare two Snapshots Data in Power BI Scenario: Suppose, we have a sample Sales data, which is stored with Monthly Snaps...

Popular Posts from this Blog