The Architecture of SQL Server Integration Services and ETL Process
SSIS is one of the most powerful features which were introduced in SQL Server 2005.
SSIS is an ETL (extract transform and load features) tool.
SSIS Definition :
SSIS can be defined in various ways which are listed below.
- ETL(extract transform and load features) tool.
- Control flow engine.
- Application platform.
- High-performance data transformation pipeline.
- Data import/export wizard.
SSIS ( SQL Server Integration Services ) is one of
technologies which is in high demand currently in IT industry.
All organizations (be it small, medium or large) have to
maintain their data in different databases like spreadsheet, flat files, RDMS
systems etc.
Sometimes we need to access data from different sources
and need to modify the data according to business needs. Need for accessing the
data from multiple data sources and need for performing broad range of data
migration tasks and transforming them according to the business needs has been
very much in demand now.
In 2000, Microsoft released DTS (Data Transformation
Services) to transform data from one source to different destination.
After 5 years in 2005, Microsoft launched another version
of DTS with more powerful features. This is named as SSIS. SSIS is
more powerful than DTS and its user friendly. SSIS has wide range of new
features added to it. SSIS is part of SQL SERVER Business Intelligence Studio.
Following Services are part of SQL SERVER Business
Intelligence Studio.:
SSIS (SQL Server Integration Services )
SSRS (SQL SERVER Reporting Services)
SSAS (SQL SERVER Analysis Services)
As already mentioned SSIS is the successor of DTS. But,
DTS and SSIS have very few things in common. DTS code is completed rewritten to
design and build SSIS.
Lots of new features are added in SSIS. Some of the new
features in SSIS are listed below.
Latest features in SSIS :
- For Each loop containers
- Package configurations
- Property expressions
- Better Active X controllers
- Script tasks
- Control flow and Data flow are separated to provide better design and to make project/code more readable and understandable
- Event Handlers
- Precedence constraints are improvised to handle conditional checks/Conditional Expressions.
- Pivot and Unpivot.
Building Blocks of SSIS :
Control Flow: Workflow or Process flow in SSIS is known as Control flow. Control flow
consists if one of more tasks that will be executed when SSIS package runs.
Data Flow: Data
Flow in SSIS defines/indicates how data should flow from one data source to
other destination. It holds information about data source, data destination and
data transformation.
Even Handlers: Event
handlers are the tasks that are executed when some event occurs during SSIS
package execution. Ex: If some error occurs during SSIS package execution, then
event handler can be programmed to run to ignore that error and continue to the
next step.
Package Explorer: Package
Explorer contains complete information about the Variables, Precedence
constrains, Event handlers, Connection Managers, Log providers, Executable.
Precedence Constraints: Precedence
Constraints links various tasks in SSIS. In simple words it is the arrow marks
that connect 2 difference tasks in SSIS. Based on the direction of Precedent
Constraints tasks will be executed in order. In other words Precedence
constraints are needed for ordering / organising the control flow in SSIS.
Connection Managers: Connection
Managers contain information that is needed to connect to various data sources
and to data destinations.
Toolbox: Toolbox
contains is collection of Control Flow items, Maintenance plan Tasks, Data Flow
Sources, Data Flow Destinations and Data Flow Transformations. In simple words
Toolbox in SSIS contains different tasks/ containers. Task is nothing but a
work unit to perform certain job/work/action.
Variables: Variables
parameters store information that can be used by Containers/tasks in SSIS
during the SSIS package execution.
Advantages of SSIS :
- SSIS has a very user friendly Graphical User Interface (GUI) through which difficult tasks can also be done with very much ease.
- We can perform different types of tasks (like loading data, extracting data, renaming file, sending mails, sending files through FTP , Data mining and lot more) in a single SSIS package without any manual intervention.
- SSIS package can be scheduled to run at a given time as per the business needs.
- SSIS can be used to connect to different data sources (like flat file, MS Access, Excel, SQLS ERVER, SYBASE, MYSQL, ORACLE etc.,). In simple words using SSIS we can connect to almost all the external data sources.
- Deploying SSIS package is very easy.
SSIS is an ETL tool (Extract, Transform and Load) which
is very much needed for the Data warehousing applications. Also SSIS is used to
perform the operations like loading the data based on the need, performing
different transformations on the data like doing calculation’s (Sum, Average,
etc.) and to define a workflow of how the process should flow and perform some
tasks on the day to day activity.
Prior to SSIS, Data Transformation Services (DTS) in SQL
Server 2000 performs the tasks with fewer features. With the introduction of
SSIS in SQL Server 2005 many new features can be used. To develop your SSIS
package you need to get installed with the SQL Server Business Intelligence
Development Studio which will be available as client tool when installing SQL
Server Management Studio (SSMS).
SSMS
Vs. BIDS :
SSMS provides different options to develop your SSIS
package starting with Import and Export wizard with which you can copy the data
from one server to the other or from one data source to the other. With these
wizards we can create a structure on how the data flow should happen and make a
package and deploy it based on our needs to execute in any environment.
Business Intelligence Development Studio (BIDS) is a tool
which can be used to develop the SSIS packages. BIDS is available with SQL Server
as an interface which provides the developers to work on the work flow of the
process that can be made step by step. Once the BIDS is installed with the SQL
Server installation we can locate it and start our process as shown in the
steps below.
ETL Process :
ETL stands for Extract
Transform and Load.
These are simple day-to-day words we use in our daily lives. The figure below
depicts ETL in real world scenario.
E -
Extract data: From various homogeneous or
non-homogeneous source systems. Data could be stored in any of the following
forms though not limited to them: Flat file, Database, XML, Web queries etc.
When we can have sources of such variety, the job of extraction is to fetch the
data from these sources and make it available for the next step.
T -
Transform Data: As already discussed that the data are
coming from various sources and we cannot assume that the data is structured in
the same way across all the sources. Therefore, we need to transform the data
to a common format so that the other transformations can be done on them. Once
we have the data we need to perform various activities like:
- Data cleansing
- Mandatory check
- Data type check
- Check for foreign key constraints
- Check for business rules and apply business rules
- Creation of surrogate keys
- Sorting the data
- Aggregating the data
- Transposing the data
- Trim the data to remove blanks.
L - Load Data: Once the transformations are done and the data
takes the form as per the requirement, we have to load the data to the
destination systems. The destinations can also be as varied as the sources.
Once the data reaches the destination, it is consumed by other systems, which
either stores it as historical data, generate reports out of it, build modes to
take business decisions etc.
--------------------------------------------------------------------------------------------------------
No comments:
Post a Comment
Hi User, Thank You for visiting My Blog. Please post your genuine Feedback or comments only related to this Blog Posts. Please do not post any Spam comments or Advertising kind of comments which will be Ignored.