Wednesday, November 13, 2013

SQL Server Data Types Vs SSIS Data Types

SQL Server Data Types and SSIS Data Types



Thanks, TAMATAM

SSIS Excel Connection Manager Error : SSIS Error Code DTS_E_CANNOTACQUIRECONNECTIONFROMCONNECTIONMANAGER

SSIS Excel Connection Manager Error : SSIS Error Code DTS_E_OLEDBERROR
Error Message :
I am trying to write my first BI package in VS 2005 to do a simple excel to SQL Server 2005 load. I have a simple dataflow containing an excel source and sql server destination. The Excel source is connected to an Excel 2003 worksheet. My machine is vista -64bit.
While debugging I get the following errors which do not help me much...can anyone tell me what I am doing wrong?

1). After "Progress Validating"
[Excel Source [1]] Error: SSIS Error Code DTS_E_CANNOTACQUIRECONNECTIONFROMCONNECTIONMANAGER
The AcquireConnection method call to the connection manager "Excel Connection Manager" failed with error code 0xC0202009. There may be error messages posted before this with more information on why the AcquireConnection method call failed.

2). After "Validation is Complete"
[Connection manager "Excel Connection Manager"] Error: SSIS Error Code DTS_E_OLEDBERROR. An OLE DB error has occurred. Error code: 0x80040154. 

An OLE DB record is available. 
Source: "Microsoft OLE DB Service Components" Hresult: 0x80040154 Description: "Class not registered".

SSIS package "Package1.dtsx" starting.
Information: 0x4004300A at Data Flow Task, DTS.Pipeline: Validation phase is beginning.

Error: 0xC0202009 at Package1, Connection manager "Excel Connection Manager": SSIS Error Code DTS_E_OLEDBERROR. An OLE DB error has occurred. Error code: 0x80040154.

An OLE DB record is available. 
Source: "Microsoft OLE DB Service Components" Hresult: 0x80040154 Description: "Class not registered".

Error: 0xC020801C at Data Flow Task, Excel Source [1]: SSIS Error Code DTS_E_CANNOTACQUIRECONNECTIONFROMCONNECTIONMANAGER. 

The AcquireConnection method call to the connection manager "Excel Connection Manager" failed with error code 0xC0202009. There may be error messages posted before this with more information on why the AcquireConnection method call failed.

Error: 0xC0047017 at Data Flow Task, DTS.Pipeline: component "Excel Source" (1) failed validation and returned error code 0xC020801C.

Error: 0xC004700C at Data Flow Task, DTS.Pipeline: One or more component failed validation.

Error: 0xC0024107 at Data Flow Task: There were errors during task validation.
SSIS package "Package1.dtsx" finished: Failure.

Solution :
This issue is occuring because you are having a 64 bit OS and Office Excel (including most of the office products) does not have 64bit provider and hence the error above on Excel connection.
You can solve this issue by setting Run64BitRunTime property as False.


Tuesday, November 12, 2013

SQL Server Integration Services Architecture and ETL Process

The Architecture of SQL Server Integration Services and ETL Process
SSIS is one of the most powerful features which were introduced in SQL Server 2005.
SSIS is an ETL (extract transform and load features) tool.
SSIS Definition :
SSIS can be defined in various ways which are listed below.
  • ETL(extract transform and load features) tool.
  • Control flow engine.
  • Application platform.
  • High-performance data transformation pipeline.
  • Data import/export wizard.
SSIS ( SQL Server Integration Services ) is one of technologies which is in high demand currently in IT industry. 
All organizations (be it small, medium or large) have to maintain their data in different databases like spreadsheet, flat files, RDMS systems etc.
Sometimes we need to access data from different sources and need to modify the data according to business needs. Need for accessing the data from multiple data sources and need for performing broad range of data migration tasks and transforming them according to the business needs has been very much in demand now.
In 2000, Microsoft released DTS (Data Transformation Services) to transform data from one source to different destination.

After 5 years in 2005, Microsoft launched another version of DTS with more powerful features. This is named as SSIS.  SSIS is more powerful than DTS and its user friendly. SSIS has wide range of new features added to it. SSIS is part of SQL SERVER Business Intelligence Studio.
Following Services are part of SQL SERVER Business Intelligence Studio.:
SSIS (SQL Server Integration Services )
SSRS (SQL SERVER Reporting Services)
SSAS (SQL SERVER Analysis Services)

As already mentioned SSIS is the successor of DTS.  But, DTS and SSIS have very few things in common. DTS code is completed rewritten to design and build SSIS.
Lots of new features are added in SSIS. Some of the new features in SSIS are listed below.

Latest features in SSIS :
  • For Each loop containers
  • Package configurations
  • Property expressions
  • Better Active X controllers
  • Script tasks
  • Control flow and Data flow are separated to provide better design and to make project/code more readable and understandable
  • Event Handlers
  • Precedence constraints are improvised to handle conditional checks/Conditional Expressions.
  • Pivot and Unpivot.
SSIS Architecture :


Building Blocks of SSIS :
Control Flow: Workflow or Process flow in SSIS is known as Control flow.  Control flow consists if one of more tasks that will be executed when SSIS package runs.
Data Flow: Data Flow in SSIS defines/indicates how data should flow from one data source to other destination. It holds information about data source, data destination and data transformation.
Even Handlers: Event handlers are the tasks that are executed when some event occurs during SSIS package execution. Ex: If some error occurs during SSIS package execution, then event handler can be programmed to run to ignore that error and continue to the next step.
Package Explorer:  Package Explorer contains complete information about the Variables, Precedence constrains, Event handlers, Connection Managers, Log providers, Executable.
Precedence Constraints: Precedence Constraints links various tasks in SSIS. In simple words it is the arrow marks that connect 2 difference tasks in SSIS. Based on the direction of Precedent Constraints tasks will be executed in order. In other words Precedence constraints are needed for ordering / organising the control flow in SSIS.
Connection Managers: Connection Managers contain information that is needed to connect to various data sources and to data destinations.
Toolbox: Toolbox contains is collection of Control Flow items, Maintenance plan Tasks, Data Flow Sources, Data Flow Destinations and Data Flow Transformations. In simple words Toolbox in SSIS contains different tasks/ containers. Task is nothing but a work unit to perform certain job/work/action.
Variables: Variables parameters store information that can be used by Containers/tasks in SSIS during the SSIS package execution.

Advantages of SSIS :
  • SSIS has a very user friendly Graphical User Interface (GUI) through which difficult tasks can also be done with very much ease.
  • We can perform different  types of tasks (like loading data, extracting data, renaming file, sending mails, sending files through FTP , Data mining and lot more) in a single SSIS package without any manual intervention.
  • SSIS package can be scheduled to run at a given time as per the business needs.
  • SSIS can be used to connect to different data sources (like flat file, MS Access, Excel, SQLS ERVER, SYBASE, MYSQL, ORACLE etc.,). In simple words using SSIS we can connect to almost all the external data sources.
  • Deploying SSIS package is very easy.
SSIS Vs. DTS :
SSIS is an ETL tool (Extract, Transform and Load) which is very much needed for the Data warehousing applications. Also SSIS is used to perform the operations like loading the data based on the need, performing different transformations on the data like doing calculation’s (Sum, Average, etc.) and to define a workflow of how the process should flow and perform some tasks on the day to day activity.
Prior to SSIS, Data Transformation Services (DTS) in SQL Server 2000 performs the tasks with fewer features. With the introduction of SSIS in SQL Server 2005 many new features can be used. To develop your SSIS package you need to get installed with the SQL Server Business Intelligence Development Studio which will be available as client tool when installing SQL Server Management Studio (SSMS).

SSMS Vs. BIDS :
SSMS provides different options to develop your SSIS package starting with Import and Export wizard with which you can copy the data from one server to the other or from one data source to the other. With these wizards we can create a structure on how the data flow should happen and make a package and deploy it based on our needs to execute in any environment.
Business Intelligence Development Studio (BIDS) is a tool which can be used to develop the SSIS packages. BIDS is available with SQL Server as an interface which provides the developers to work on the work flow of the process that can be made step by step. Once the BIDS is installed with the SQL Server installation we can locate it and start our process as shown in the steps below.

ETL Process : 
ETL stands for Extract Transform and Load. These are simple day-to-day words we use in our daily lives. The figure below depicts ETL in real world scenario.


E - Extract data: From various homogeneous or non-homogeneous source systems. Data could be stored in any of the following forms though not limited to them: Flat file, Database, XML, Web queries etc. When we can have sources of such variety, the job of extraction is to fetch the data from these sources and make it available for the next step.

T - Transform Data: As already discussed that the data are coming from various sources and we cannot assume that the data is structured in the same way across all the sources. Therefore, we need to transform the data to a common format so that the other transformations can be done on them. Once we have the data we need to perform various activities like:
  • Data cleansing
  • Mandatory check
  • Data type check
  • Check for foreign key constraints
  • Check for business rules and apply business rules
  • Creation of surrogate keys
  • Sorting the data
  • Aggregating the data
  • Transposing the data
  • Trim the data to remove blanks.
The list can go on as the business requirements get complex day by day and hence the transformations get complex. While transformations are on, we need to log the anomalies in data for reporting and corrective action to be taken.

L - Load Data: Once the transformations are done and the data takes the form as per the requirement, we have to load the data to the destination systems. The destinations can also be as varied as the sources. Once the data reaches the destination, it is consumed by other systems, which either stores it as historical data, generate reports out of it, build modes to take business decisions etc.
--------------------------------------------------------------------------------------------------------

Saturday, November 9, 2013

How to Extract Text Characters from Alphanumeric Strings in a Range or Cell Value

Excel VBA Macro to Extract Text from Alphanumeric Strings in a Range or Cell
Sub ExtractNumbers()
Dim i As Integer
Dim j As Integer

On Error Resume Next

For i = 2 To 10                'Rows Having Alpha Numeric Strings

If Cells(i, 1) = "" Then Exit For

For j = 1 To Len(Cells(i, 1))

If WorksheetFunction.IsText(Mid(Cells(i, 1), j, 1) + 0) = True Then
Cells(i, 2) = Cells(i, 2) & Mid(Cells(i, 1), j, 1)
End If
Next j

Next i
End Sub

Output :


String
Extracted Text
A1E5I9O15U
AEIOU


--------------------------------------------------------------------------------------------------------
Thanks, TAMATAM ; Business Intelligence & Analytics Professional
--------------------------------------------------------------------------------------------------------

Friday, November 8, 2013

How to Extract Numbers from Alphanumeric Strings from a Range or Cell Value

Excel VBA Macro to Extract Only Numbers From Text String from a Range or Cell Value
Sub ExtractNumbers()
Dim i As Integer
Dim j As Integer

On Error Resume Next

For i = 2 To 10                       'Rows Having Alpha Numeric Strings
If Cells(i, 1) = "" Then Exit For

For j = 1 To Len(Cells(i, 1))
If WorksheetFunction.IsNumber(Mid(Cells(i, 1), j, 1) + 0) = False 
Then GoTo Jnext:
Cells(i, 2) = Cells(i, 2) & Mid(Cells(i, 1), j, 1)
Jnext:
Next j

Next i
End Sub

Output :


String
Extracted Numbers
A1E5I9O15U
15915

Thanks, Tamatam

How to use GROUP BY Clause in SQL Server

SQL Server - GROUP BY Clause
GROUP BY clause is used to divide a table into logical groups and calculate aggregate statistics for each group.
Important properties of GROUP BY clause:
GROUP BY clause appears after the WHERE clause and before the ORDER BY clause. 
  • We can group columns or derived columns. 
  • Columns from the input table can appear in an aggregate query's SELECT clause only when they are also included in the GROUP BY clause. 
  • Group BY expression must match the SELECT expression exactly. 
  • If you specify multiple grouping columns in the GROUP BY clause to nest groups, data is summarized at the final specified group. 
  • If WHERE clause is used in the query containing a GROUP BY clause, Rows are eliminate 
  • First which does not satisfy where condition and then grouping occurs. 
  • You cannot use column alias in the GROUP BY clause but table aliases are allowed.
Syntax  :
SELECT Columns
FROM Table
[WHERE Search_condition]
GROUP BY Grouping_columns
[HAVING Search_condition]
[ORDER BY Sort_columns]

Here…
Columns and grouping columns are one or more comma separated column names.
Table is a name of table that contains columns and Grouping_columns.
Search_condition is a valid sql expression.
Sort_columns are one or more column name of specified table.

Consider the following Table  EMP :


Example-I :
A Query to find the count of Employees of various Designations (JOB) in each Department (DEPT_ID) from the above table EMP.

SELECT DEPT_ID,JOB, COUNT(*) AS COUNT_OF_EMPS 
FROM EMP 
GROUP BY DEPT_ID  ,JOB 
ORDER BY DEPT_ID

OR we can write as follows

SELECT DEPT_ID,JOB, COUNT(EMP_ID) AS COUNT_OF_EMPS 
FROM EMP 
GROUP BY DEPT_ID  ,JOB 
ORDER BY DEPT_ID

Output :

Example-II :
A Query to find the count of Employees of Designation (JOB)='Analyst" in each Department (DEPT_ID) from the above table EMP.

SELECT  JOB,DEPT_ID,COUNT(EMP_ID) AS COUNT_of_Analysts 
FROM EMP 
WHERE JOB='Analyst' 
GROUP BY JOB,DEPT_ID 
ORDER BY DEPT_ID

Output :

Notes :
Where clause can be use only on Existing columns of a table.
Where clause cannot be use on Alias columns of a table.
Where clause should use only after From clause and before Group By clause of a table.

Thanks,
TAMATAM

Tuesday, November 5, 2013

What is the difference between Len and DataLength Functions in SQL

SQL Server Len() Function Vs. DataLength() Function
The LEN system function returns the number of characters of the specified string expression.
On the other hand, the DATALENGTH function returns the number of bytes used to represent any expression. Here’s a summary of the similarities and differences between the LEN() and DATALENGTH() system functions:

Similarities:
The LEN and DATALENGTH of NULL is NULL.
The return data type is BIGINT if the input expressions is of the VARCHAR(MAX), NVARCHAR(MAX) or VARBINARY(MAX) data types; otherwise the return data type is INT.


Differences:
                               LEN
                     DATALENGTH
Returns the number of characters of the specified string expression.
Returns the number of bytes used to represent any expression.
Excludes trailing blanks in determining the number of characters.
Includes trailing blanks in determining the number of bytes used.
Input parameter is of character or binary data type.
Input parameter is of any data type.
LEN is a string function.
DATALENGTH is a data type function.

LEN() returns the number of characters in the string or variable while DATALENGTH() returns the number of bytes used to represent a string or expression. We have 4 different cases in which we can study these two functions.

Case 1:  With  "Varchar" Data types: 
When a Varchar type is passed to both the DATALENGTH() and LEN() functions, they return the same value.

Example:
                DECLARE @value varchar(20)
                        SET @value = ‘Reddyz’
                SELECT  DATALENGTH(@value)
                SELECT  LEN(@value)


These statements both return 6 as the Varchar type uses 6 single bytes characters to store the 6-character value. So in this case, both the functions are returning the same.

Case 2:   With "Nvarchar" Data types:  
If an nvarchar type is used, DATALENGTH() takes twice as many bytes to manage a value of the same length whereas LEN() returns the number of characters.

Example:
                DECLARE @value nvarchar(20)
                       SET @value = ‘Reddyz’
                SELECT  DATALENGTH(@value)
                SELECT LEN(@value)

The DATALENGTH() returns 12 because 2 bytes are used to store each character using a Unicode character set. The LEN() returns 6 as this returns the number of characters, not the number of bytes.


Case 3:   With Integer Data types: 
If integer Datatype is used, DATALENGTH() returns 4 and LEN() returns the number of digits.

Example:
                DECLARE  @value1 int, @value2 int
                        SET @value1=2
                        SET @value2 =20000
                SELECT  DATALENGTH(@value1)
                SELECT  LEN(@value1)
                SELECT  DATALENGTH(@value2)
                SELECT  LEN(@value2)

The DATALENGTH() returns 4 in both the cases because int always takes 4 bytes, whatever the value would be. But the LEN() treats the integer value as if it were converted to a character type and returns the number of digits. So results are 1 and 5 respectively.

Case 4:  With Extra Spaces:
If string or variable value is NULL then both functions returns null.
If the string is having trailing blanks then LEN() function returns the length only up to the last non null character and ignores the spaces.
On the other hand, DATALENGTH() behaves differently returning all along with spaces.


Example:
                  DECLARE @value1 VARCHAR(25)
                  SELECT @value1 = 'Reddyz     '
                  SELECT @value1 AS COL_TEXT, LEN(@value1) AS   LENGTH,                                          DATALENGTH (@value1) AS   DATA_LENGTH

The result would be 6 and 11. As DATALENTH() takes into account the spaces so that it returns 11 and LEN() returns only the non-null characters so it returns 6.

#--------------------------------------------------------------Thanks--------------------------------------------------------------#

Featured Post from this Blog

How to compare Current Snapshot Data with Previous Snapshot in Power BI

How to Dynamically compare two Snapshots Data in Power BI Scenario: Suppose, we have a sample Sales data, which is stored with Monthly Snaps...

Popular Posts from this Blog