Power BI Analytics Kingdom

Thursday, December 26, 2013

What is the difference between Candidate Key, Alternate Key, and Composite Key

What is Candidate Key, Alternate Key, and Composite Key
Candidate Key :
A Candidate Key is one that can uniquely identify each row of a table. Generally a candidate key becomes the Primary Key of the table.

Alternate Key:
If the table has more than one Candidate Key, one of them will become the Primary Key, and the rest are called Alternate Keys.

Composite Key :
A key formed by combining at least two or more columns is called Composite Key.

--------------------------------------------------------------------------------------------------------

Wednesday, November 13, 2013

SQL Server Data Types Vs SSIS Data Types

SQL Server Data Types and SSIS Data Types

Thanks, TAMATAM

SSIS Excel Connection Manager Error : SSIS Error Code DTS_E_CANNOTACQUIRECONNECTIONFROMCONNECTIONMANAGER

SSIS Excel Connection Manager Error : SSIS Error Code DTS_E_OLEDBERROR

Error Message :

I am trying to write my first BI package in VS 2005 to do a simple excel to SQL Server 2005 load. I have a simple dataflow containing an excel source and sql server destination. The Excel source is connected to an Excel 2003 worksheet. My machine is vista -64bit.

While debugging I get the following errors which do not help me much...can anyone tell me what I am doing wrong?

1). After "Progress Validating"

[Excel Source [1]] Error: SSIS Error Code DTS_E_CANNOTACQUIRECONNECTIONFROMCONNECTIONMANAGER.

The AcquireConnection method call to the connection manager "Excel Connection Manager" failed with error code 0xC0202009. There may be error messages posted before this with more information on why the AcquireConnection method call failed.

2). After "Validation is Complete"

[Connection manager "Excel Connection Manager"] Error: SSIS Error Code DTS_E_OLEDBERROR. An OLE DB error has occurred. Error code: 0x80040154.

An OLE DB record is available.

Source: "Microsoft OLE DB Service Components" Hresult: 0x80040154 Description: "Class not registered".

SSIS package "Package1.dtsx" starting.

Information: 0x4004300A at Data Flow Task, DTS.Pipeline: Validation phase is beginning.

Error: 0xC0202009 at Package1, Connection manager "Excel Connection Manager": SSIS Error Code DTS_E_OLEDBERROR. An OLE DB error has occurred. Error code: 0x80040154.

An OLE DB record is available.

Source: "Microsoft OLE DB Service Components" Hresult: 0x80040154 Description: "Class not registered".

Error: 0xC020801C at Data Flow Task, Excel Source [1]: SSIS Error Code DTS_E_CANNOTACQUIRECONNECTIONFROMCONNECTIONMANAGER.

Error: 0xC0047017 at Data Flow Task, DTS.Pipeline: component "Excel Source" (1) failed validation and returned error code 0xC020801C.

Error: 0xC004700C at Data Flow Task, DTS.Pipeline: One or more component failed validation.

Error: 0xC0024107 at Data Flow Task: There were errors during task validation.

SSIS package "Package1.dtsx" finished: Failure.

Solution :

This issue is occuring because you are having a 64 bit OS and Office Excel (including most of the office products) does not have 64bit provider and hence the error above on Excel connection.
You can solve this issue by setting Run64BitRunTime property as False.

Tuesday, November 12, 2013

SQL Server Integration Services Architecture and ETL Process

The Architecture of SQL Server Integration Services and ETL Process

SSIS is one of the most powerful features which were introduced in SQL Server 2005.

SSIS is an ETL (extract transform and load features) tool.

SSIS Definition :

SSIS can be defined in various ways which are listed below.

ETL(extract transform and load features) tool.
Control flow engine.
Application platform.
High-performance data transformation pipeline.
Data import/export wizard.

SSIS ( SQL Server Integration Services ) is one of technologies which is in high demand currently in IT industry.

All organizations (be it small, medium or large) have to maintain their data in different databases like spreadsheet, flat files, RDMS systems etc.

Sometimes we need to access data from different sources and need to modify the data according to business needs. Need for accessing the data from multiple data sources and need for performing broad range of data migration tasks and transforming them according to the business needs has been very much in demand now.

In 2000, Microsoft released DTS (Data Transformation Services) to transform data from one source to different destination.

After 5 years in 2005, Microsoft launched another version of DTS with more powerful features. This is named as SSIS. SSIS is more powerful than DTS and its user friendly. SSIS has wide range of new features added to it. SSIS is part of SQL SERVER Business Intelligence Studio.

Following Services are part of SQL SERVER Business Intelligence Studio.:

SSIS (SQL Server Integration Services )

SSRS (SQL SERVER Reporting Services)

SSAS (SQL SERVER Analysis Services)

As already mentioned SSIS is the successor of DTS. But, DTS and SSIS have very few things in common. DTS code is completed rewritten to design and build SSIS.

Lots of new features are added in SSIS. Some of the new features in SSIS are listed below.

Latest features in SSIS :

For Each loop containers
Package configurations
Property expressions
Better Active X controllers
Script tasks
Control flow and Data flow are separated to provide better design and to make project/code more readable and understandable
Event Handlers
Precedence constraints are improvised to handle conditional checks/Conditional Expressions.
Pivot and Unpivot.

SSIS Architecture :

Building Blocks of SSIS :

Control Flow: Workflow or Process flow in SSIS is known as Control flow. Control flow consists if one of more tasks that will be executed when SSIS package runs.

Data Flow: Data Flow in SSIS defines/indicates how data should flow from one data source to other destination. It holds information about data source, data destination and data transformation.

Even Handlers: Event handlers are the tasks that are executed when some event occurs during SSIS package execution. Ex: If some error occurs during SSIS package execution, then event handler can be programmed to run to ignore that error and continue to the next step.

Package Explorer: Package Explorer contains complete information about the Variables, Precedence constrains, Event handlers, Connection Managers, Log providers, Executable.

Precedence Constraints: Precedence Constraints links various tasks in SSIS. In simple words it is the arrow marks that connect 2 difference tasks in SSIS. Based on the direction of Precedent Constraints tasks will be executed in order. In other words Precedence constraints are needed for ordering / organising the control flow in SSIS.

Connection Managers: Connection Managers contain information that is needed to connect to various data sources and to data destinations.

Toolbox: Toolbox contains is collection of Control Flow items, Maintenance plan Tasks, Data Flow Sources, Data Flow Destinations and Data Flow Transformations. In simple words Toolbox in SSIS contains different tasks/ containers. Task is nothing but a work unit to perform certain job/work/action.

Variables: Variables parameters store information that can be used by Containers/tasks in SSIS during the SSIS package execution.

Advantages of SSIS :

SSIS has a very user friendly Graphical User Interface (GUI) through which difficult tasks can also be done with very much ease.
We can perform different types of tasks (like loading data, extracting data, renaming file, sending mails, sending files through FTP , Data mining and lot more) in a single SSIS package without any manual intervention.
SSIS package can be scheduled to run at a given time as per the business needs.
SSIS can be used to connect to different data sources (like flat file, MS Access, Excel, SQLS ERVER, SYBASE, MYSQL, ORACLE etc.,). In simple words using SSIS we can connect to almost all the external data sources.
Deploying SSIS package is very easy.

SSIS Vs. DTS :

SSIS is an ETL tool (Extract, Transform and Load) which is very much needed for the Data warehousing applications. Also SSIS is used to perform the operations like loading the data based on the need, performing different transformations on the data like doing calculation’s (Sum, Average, etc.) and to define a workflow of how the process should flow and perform some tasks on the day to day activity.

Prior to SSIS, Data Transformation Services (DTS) in SQL Server 2000 performs the tasks with fewer features. With the introduction of SSIS in SQL Server 2005 many new features can be used. To develop your SSIS package you need to get installed with the SQL Server Business Intelligence Development Studio which will be available as client tool when installing SQL Server Management Studio (SSMS).

SSMS Vs. BIDS :

SSMS provides different options to develop your SSIS package starting with Import and Export wizard with which you can copy the data from one server to the other or from one data source to the other. With these wizards we can create a structure on how the data flow should happen and make a package and deploy it based on our needs to execute in any environment.

Business Intelligence Development Studio (BIDS) is a tool which can be used to develop the SSIS packages. BIDS is available with SQL Server as an interface which provides the developers to work on the work flow of the process that can be made step by step. Once the BIDS is installed with the SQL Server installation we can locate it and start our process as shown in the steps below.

ETL Process :

ETL stands for Extract Transform and Load. These are simple day-to-day words we use in our daily lives. The figure below depicts ETL in real world scenario.

E - Extract data: From various homogeneous or non-homogeneous source systems. Data could be stored in any of the following forms though not limited to them: Flat file, Database, XML, Web queries etc. When we can have sources of such variety, the job of extraction is to fetch the data from these sources and make it available for the next step.

T - Transform Data: As already discussed that the data are coming from various sources and we cannot assume that the data is structured in the same way across all the sources. Therefore, we need to transform the data to a common format so that the other transformations can be done on them. Once we have the data we need to perform various activities like:

Data cleansing
Mandatory check
Data type check
Check for foreign key constraints
Check for business rules and apply business rules
Creation of surrogate keys
Sorting the data
Aggregating the data
Transposing the data
Trim the data to remove blanks.

The list can go on as the business requirements get complex day by day and hence the transformations get complex. While transformations are on, we need to log the anomalies in data for reporting and corrective action to be taken.

L - Load Data: Once the transformations are done and the data takes the form as per the requirement, we have to load the data to the destination systems. The destinations can also be as varied as the sources. Once the data reaches the destination, it is consumed by other systems, which either stores it as historical data, generate reports out of it, build modes to take business decisions etc.

--------------------------------------------------------------------------------------------------------

Saturday, November 9, 2013

How to Extract Text Characters from Alphanumeric Strings in a Range or Cell Value

Excel VBA Macro to Extract Text from Alphanumeric Strings in a Range or Cell

Sub ExtractNumbers()
Dim i As Integer
Dim j As Integer

On Error Resume Next

For i = 2 To 10 'Rows Having Alpha Numeric Strings

If Cells(i, 1) = "" Then Exit For

For j = 1 To Len(Cells(i, 1))
If WorksheetFunction.IsText(Mid(Cells(i, 1), j, 1) + 0) = True Then
Cells(i, 2) = Cells(i, 2) & Mid(Cells(i, 1), j, 1)
End If
Next j

Next i
End Sub

Output :

String	Extracted Text
A1E5I9O15U	AEIOU

--------------------------------------------------------------------------------------------------------
Thanks, TAMATAM ; Business Intelligence & Analytics Professional
--------------------------------------------------------------------------------------------------------

Friday, November 8, 2013

How to Extract Numbers from Alphanumeric Strings from a Range or Cell Value

Excel VBA Macro to Extract Only Numbers From Text String from a Range or Cell Value

Sub ExtractNumbers()

Dim i As Integer

Dim j As Integer

On Error Resume Next

For i = 2 To 10 'Rows Having Alpha Numeric Strings

If Cells(i, 1) = "" Then Exit For

For j = 1 To Len(Cells(i, 1))

If WorksheetFunction.IsNumber(Mid(Cells(i, 1), j, 1) + 0) = False
Then GoTo Jnext:

Cells(i, 2) = Cells(i, 2) & Mid(Cells(i, 1), j, 1)

Jnext:

Next j

Next i

End Sub

Output :

String	Extracted Numbers
A1E5I9O15U	15915

Thanks, Tamatam

How to use GROUP BY Clause in SQL Server

SQL Server - GROUP BY Clause

GROUP BY clause is used to divide a table into logical groups and calculate aggregate statistics for each group.
Important properties of GROUP BY clause:
GROUP BY clause appears after the WHERE clause and before the ORDER BY clause.

We can group columns or derived columns.
Columns from the input table can appear in an aggregate query's SELECT clause only when they are also included in the GROUP BY clause.
Group BY expression must match the SELECT expression exactly.
If you specify multiple grouping columns in the GROUP BY clause to nest groups, data is summarized at the final specified group.
If WHERE clause is used in the query containing a GROUP BY clause, Rows are eliminate
First which does not satisfy where condition and then grouping occurs.
You cannot use column alias in the GROUP BY clause but table aliases are allowed.

Syntax :

SELECT Columns

FROM Table

[WHERE Search_condition]

GROUP BY Grouping_columns

[HAVING Search_condition]

[ORDER BY Sort_columns]

Here…

Columns and grouping columns are one or more comma separated column names.
Table is a name of table that contains columns and Grouping_columns.

Search_condition is a valid sql expression.

Sort_columns are one or more column name of specified table.

Consider the following Table EMP :

Example-I :

A Query to find the count of Employees of various Designations (JOB) in each Department (DEPT_ID) from the above table EMP.

SELECT DEPT_ID,JOB, COUNT(*) AS COUNT_OF_EMPS
FROM EMP
GROUP BY DEPT_ID ,JOB
ORDER BY DEPT_ID

OR we can write as follows

SELECT DEPT_ID,JOB, COUNT(EMP_ID) AS COUNT_OF_EMPS
FROM EMP
GROUP BY DEPT_ID ,JOB
ORDER BY DEPT_ID

Output :

Example-II :

A Query to find the count of Employees of Designation (JOB)='Analyst" in each Department (DEPT_ID) from the above table EMP.

SELECT JOB,DEPT_ID,COUNT(EMP_ID) AS COUNT_of_Analysts
FROM EMP
WHERE JOB='Analyst'
GROUP BY JOB,DEPT_ID
ORDER BY DEPT_ID

Output :

Notes :

Where clause can be use only on Existing columns of a table.

Where clause cannot be use on Alias columns of a table.

Where clause should use only after From clause and before Group By clause of a table.

Thanks,TAMATAM

Tuesday, November 5, 2013

What is the difference between Len and DataLength Functions in SQL

SQL Server Len() Function Vs. DataLength() Function
The LEN system function returns the number of characters of the specified string expression.
On the other hand, the DATALENGTH function returns the number of bytes used to represent any expression. Here’s a summary of the similarities and differences between the LEN() and DATALENGTH() system functions:

Similarities:
The LEN and DATALENGTH of NULL is NULL.
The return data type is BIGINT if the input expressions is of the VARCHAR(MAX), NVARCHAR(MAX) or VARBINARY(MAX) data types; otherwise the return data type is INT.

Differences:

LEN	DATALENGTH
Returns the number of characters of the specified string expression.	Returns the number of bytes used to represent any expression.
Excludes trailing blanks in determining the number of characters.	Includes trailing blanks in determining the number of bytes used.
Input parameter is of character or binary data type.	Input parameter is of any data type.
LEN is a string function.	DATALENGTH is a data type function.

LEN() returns the number of characters in the string or variable while DATALENGTH() returns the number of bytes used to represent a string or expression. We have 4 different cases in which we can study these two functions.

Case 1: With "Varchar" Data types:

When a Varchar type is passed to both the DATALENGTH() and LEN() functions, they return the same value.

Example:

                DECLARE @value varchar(20)
                        SET @value = ‘Reddyz’
                SELECT DATALENGTH(@value)
                SELECT LEN(@value)

These statements both return 6 as the Varchar type uses 6 single bytes characters to store the 6-character value. So in this case, both the functions are returning the same.

Case 2: With "Nvarchar" Data types:
If an nvarchar type is used, DATALENGTH() takes twice as many bytes to manage a value of the same length whereas LEN() returns the number of characters.

Example:

                DECLARE @value nvarchar(20)
                      SET @value = ‘Reddyz’
                SELECT DATALENGTH(@value)
                SELECT LEN(@value)

The DATALENGTH() returns 12 because 2 bytes are used to store each character using a Unicode character set. The LEN() returns 6 as this returns the number of characters, not the number of bytes.

Case 3: With Integer Data types:

If integer Datatype is used, DATALENGTH() returns 4 and LEN() returns the number of digits.

Example:

                DECLARE @value1 int, @value2 int
                        SET @value1=2
                        SET @value2 =20000
                SELECT DATALENGTH(@value1)
                SELECT LEN(@value1)
                SELECT DATALENGTH(@value2)
                SELECT LEN(@value2)

The DATALENGTH() returns 4 in both the cases because int always takes 4 bytes, whatever the value would be. But the LEN() treats the integer value as if it were converted to a character type and returns the number of digits. So results are 1 and 5 respectively.

Case 4: With Extra Spaces:

If string or variable value is NULL then both functions returns null.
If the string is having trailing blanks then LEN() function returns the length only up to the last non null character and ignores the spaces.
On the other hand, DATALENGTH() behaves differently returning all along with spaces.

Example:

                  DECLARE @value1 VARCHAR(25)
                  SELECT @value1 = 'Reddyz     '
                  SELECT @value1 AS COL_TEXT, LEN(@value1) AS   LENGTH, DATALENGTH (@value1) AS   DATA_LENGTH

The result would be 6 and 11. As DATALENTH() takes into account the spaces so that it returns 11 and LEN() returns only the non-null characters so it returns 6.

#--------------------------------------------------------------Thanks--------------------------------------------------------------#

Power BI Analytics Kingdom | Blog

Search This Blog

Thursday, December 26, 2013

What is the difference between Candidate Key, Alternate Key, and Composite Key

Wednesday, November 13, 2013

SQL Server Data Types Vs SSIS Data Types

SSIS Excel Connection Manager Error : SSIS Error Code DTS_E_CANNOTACQUIRECONNECTIONFROMCONNECTIONMANAGER

Tuesday, November 12, 2013

SQL Server Integration Services Architecture and ETL Process

Saturday, November 9, 2013

How to Extract Text Characters from Alphanumeric Strings in a Range or Cell Value

Friday, November 8, 2013

How to Extract Numbers from Alphanumeric Strings from a Range or Cell Value

How to use GROUP BY Clause in SQL Server

Tuesday, November 5, 2013

What is the difference between Len and DataLength Functions in SQL

Featured Post from this Blog

How to compare Current Snapshot Data with Previous Snapshot in Power BI

Popular Posts from this Blog