Tuesday, 31 December 2013

How to generate random character strings in a range in Excel

How to generate Alpha-Numeric strings in a range in Excel
In Microsoft Excel , Sometimes you may need to generate random strings , such as different passwords , uniqued ids with Numeric or Alpha Numeric combinaiton.It can be done as :

1. To create a random 5-digit number between 10000 and 99999, use this formula: 


Now press Enter key, a 5-digit number will be display in a cell, then select the cell and drag the fill handle across to the range that you want to contain this formula, a range of 5-digit numbers have been generated, see screenshots:

2. To create a random letter, use this formula: 


This formula can generate a random letter from a to z.

3. To create a Alpha-Numeric String: 
If you need to create Alphanumeric Strings with special characters, you need to use & character to join letters and the letters. 

See the following example how it is generationg Alpha Numeric string with speical character '-'
The following formula generates a Alpha Numeric String( 3 Alpha &"-"& 3Numeric&3 Alpha) of length 11.


The result would be as follows :


Thursday, 26 December 2013

What is the Difference between Candidate Key, Alternate Key, and Composite Key

Explain Candidate Key, Alternate Key, and Composite Key

Candidate Key :
A Candidate Key is one that can uniquely identify each row of a table.Generally a candidate key becomes the Primary Key of the table. 

Alternate Key:
If the table has more than one Candidate Key, one of them will become the Primary Key, and the rest are called Alternate Keys.

Composite Key : 
A key formed by combining at least two or more columns is called Composite Key.

Tuesday, 19 November 2013

What is Slowly Changing Dimension in Data Warehouse

SCD-Slowly Changing Dimension - Data Warehouse
SCD is Slowly Changing Dimension. As the name suggests, a dimension which changes slowly. For Example, say there is a table Employee, which stores information regarding employee as below:

BusinessEntityID, PANNumber, First_Name, last_Name LoginID, OrganizationNode
OrganizationLevel, JobTitle, BirthDate, MaritalStatus, Gender, HireDate, SalariedFlag
CurrentFlag, ModifiedDate

In this Employee table, the data for an employee doesn't change very often, but yes we can’t say that the changes won’t be there. The changes, which may happen, are:
  • Mistakenly spelling of First_Name is stored incorrect.
  • The employee gets married and marital status changes.
  • Last_Name changes.
  • The employee gets promotion and job designation changes and organization level         changes.
  • The columns which doesn't change except if we assume that no mistake happens while data entry are HireDate, Gender, PANNumber
The changes discussed don’t happen frequently, but may happen after certain time.

SCD supports four types of changes: 
Changing Attribute, Historical Attribute, Fixed Attribute, and Inferred Member.

Type 1 (Changing Attribute)
When the changes in any attribute or column overwrites the existing records.
For example; as discussed first name of employee is misspelled and wrong spelling is stored in first name of that employee. For making the first name correct, we don’t need to add one more record for the same employee, so we can overwrite the first name. SCD which does this kind of changes comes into type 1 category. This SCD transformation directs these rows to an output named Changing Attributes Updates Output.

Emp ID
First Name
Last Name
Raj Sekar
Raja Sekhar

This SCD transformation directs these rows to an output named Changing Attributes Updates Output.
Type 2 (Historical Attribute): 
when we need to maintain the history of records, whenever some particular column value changes.
For example the employee gets promotion, designation changes and organization level changes. In such case we need to maintain the history of the employee, that with which designation he joined, and when his designation and organizational level changes.

For these kinds of changes, there will be multiple records for the same employee with different designation. Then to indentify the current records, we can either add a column as current flag, which will be ‘y’ for the current or latest records, Or else we can add two column as start date and end date (expiry date), through which we can maintain history of employees records. This SCD directs these rows to two outputs: Historical Attribute Inserts Output and New Output.

Raja Sekhar
Graduate Engineer
Raja Sekhar
Analyst Programmer
Raja Sekhar
Business Analyst

Type 3 (Fixed Attribute) : when the attribute must not change.
For example HireDate, Gender, PANNumber should never change. So whenever changes will occur in these columns value then either it should throw error or the changes can be saved in some other destination. But changes should not be applied in the columns.

This SCD transformation detects changes and directs the rows with changes to an output named Fixed Attribute Output.

Inferred member:
  are those records of the dimension, which are found missing during fact load. 

For example, say there is a fact table which contains employee and department information. While generating the fact table from Employee_Data table and Department_Data table , sometimes happens that Employee_Data contains some departments name which has no records in department table , and during fact table generation those records are found missing from department table, these kind of member of dimension department are called inferred member. It’s like ‘Fact arriving earlier than dimensions’.

This SCD transformation directs these rows to an output named Inferred Member Updates. When data for the inferred member is loaded, you can update the existing record rather than create a new one.


Sunday, 17 November 2013

Data Warehouse Introduction and Basic Concepts-OLTP,Facts,Dimensions,OLAP,Star Schema,Snowflake Schema

Data Warehouse Introduction and Basic Concepts
What is data warehouse 
Data warehousing is the science of storing data for the purpose of meaningful future analysis. A subject-oriented, integrated, time-variant, non volatile collection of data used in support of management decision-making processes.

Data warehouse key features :
Subject Oriented
This means a data warehouse has a defined scope and focusing on particular subject matter and it only stores data under that scope. So for example, if the sales team of your company is creating a data warehouse - the data warehouse by definition is required to contain data related to sales (and not the data related to production management for example).
This means that data once stored in the data warehouse are not removed or deleted from it and always stay there no matter what.
This means that the data stored in a data warehouse make sense. Fact and figures are related to each other and they are integrable and projects a single point of truth.
Time variant 
This means that data is not constant, as new and new data gets loaded in the warehouse, data warehouse also grows in size.

Key Note
Note here, Non-volatile means that the data once loaded in the warehouse will not get deleted later. Time-variant means the data will change with respect to time.

Data warehouse Architecture :

What are the benefits of data warehouse :
A data warehouse helps to integrate data and store them historically so that we can analyze different aspects of business including, performance analysis, trend, prediction etc. over a given time frame and use the result of our analysis to improve the efficiency of business processes.
Why Data Warehouse is used :
For a long time in the past and also even today, Data warehouses are built to facilitate reporting on different key business processes of an organization, known as KPI. Data warehouses also help to integrate data from different sources and show a single-point-of-truth values about the business measures.
Data warehouse can be further used for data mining which helps trend prediction, forecasts, pattern recognition etc. 
What is the difference between OLTP and OLAP :
OLTP is the transaction system that collects business data. Whereas OLAP is the reporting and analysis system on that data. OLTP systems are optimized for INSERT, UPDATE operations and therefore highly normalized. On the other hand, OLAP systems are deliberately denormalized for fast data retrieval through SELECT operations.
Key Note:
In a departmental shop, when we pay the prices at the check-out counter, the sales person at the counter keys-in all the data into a "Point-Of-Sales" machine. That data is transaction data and the related system is a OLTP system. 
On the other hand, the manager of the store might want to view a report on out-of-stock materials, so that he can place purchase order for them. Such report will come out from OLAP system
What is data mart :
Data marts are generally designed for a single subject area. An organization may have data pertaining to different departments like Finance, HR, Marketing etc. stored in data warehouse and each department may have separate data marts. These data marts can be built on top of the data warehouse.
What is ER model:
ER model or entity-relationship model is a particular methodology of data modeling wherein the goal of modeling is to normalize the data by reducing redundancy. This is different than dimensional modeling where the main goal is to improve the data retrieval mechanism.
What is dimensional modeling :
Dimensional model consists of dimension and fact tables. Fact tables store different transactional measurements and the foreign keys from dimension tables that qualify the data. The goal of Dimensional model is not to achieve high degree of normalization but to facilitate easy and faster data retrieval.
Ralph Kimball is one of the strongest proponents of this very popular data modeling technique which is often used in many enterprise level data warehouses.
What is dimension :
A dimension is something that qualifies a quantity (measure).
For an example, consider this: If I just say… “20kg”, it does not mean anything. But if I say, "20kg of Rice (Product) is sold to Tamatam(customer) on 5th April (date)", then that gives a meaningful sense. These product, customer and dates are some dimension that qualified the measure - 20kg.
Dimensions are mutually independent. Technically speaking, a dimension is a data element that categorizes each item in a data set into non-overlapping regions.
What is Fact :
A fact is something that is quantifiable (Or measurable). Facts are typically (but not always) numerical values that can be aggregated.
What are additive, semi-additive and non-additive measures:
Non-additive Measures
Non-additive measures are those which cannot be used inside any numeric aggregation function (e.g. SUM(), AVG() etc.). One example of non-additive fact is any kind of ratio or percentage. Example, 5% profit margin, revenue to asset ratio etc. A non-numerical data can also be a non-additive measure when that data is stored in fact tables, e.g. some kind of varchar flags in the fact table.
Semi Additive Measures
Semi-additive measures are those where only a subset of aggregation function can be applied. Let’s say account balance. A sum() function on balance does not give a useful result but max() or min() balance might be useful. Consider price rate or currency rate. Sum is meaningless on rate; however, average function might be useful.
Additive Measures
Additive measures can be used with any aggregation function like Sum(), Avg() etc. Example is Sales Quantity etc.
Schemas :
A schema is a collection of database objects, including tables, views, indexes, and synonyms. 

There is a variety of ways of arranging schema objects in the schema models designed for data warehousing. The most common data-warehouse schema model is a star schema.

However, a significant but smaller number of data warehouses use third-normal-form (3NF) schemas, or other schemas which are more highly normalized than star schemas. These 3NF data warehouses are typically very large data warehouses, which are used primarily for loading data and for feeding data marts. These data warehouses are not typically used for heavy end-user query workloads. 
Star Schema
The star schema is the simplest data warehouse schema. It is called a star schema because the diagram of a star schema resembles a star, with points radiating from a center. The center of the star consists of one or more fact tables and the points of the star are the dimension tables. 
This schema is used in data warehouse models where one centralized fact table references number of dimension tables so as the keys (primary key) from all the dimension tables flow into the fact table (as foreign key) where measures are stored. This entity-relationship diagram looks like a star, hence the name. 
Consider a fact table that stores sales quantity for each product and customer on a certain time. Sales quantity will be the measure here and keys from customer, product and time dimension tables will flow into the fact table. 
A star schema is characterized by one or more very large fact tables that contain the primary information in the data warehouse and a number of much smaller dimension tables (or lookup tables), each of which contains information about the entries for a particular attribute in the fact table.
A star query is a join between a fact table and a number of lookup tables. Each lookup table is joined to the fact table using a primary-key to foreign-key join, but the lookup tables are not joined to each other.
Cost-based optimization recognizes star queries and generates efficient execution plans for them. (Star queries are not recognized by rule-based optimization.)
A typical fact table contains keys and measures.
For example, a simple fact table might contain the measure Sales, and keys Time, Product, and Market. In this case, there would be corresponding dimension tables for Time, Product, and Market. The Product dimension table, for example, would typically contain information about each product number that appears in the fact table.
A measure is typically a numeric or character column, and can be taken from one column in one table or derived from two columns in one table or two columns in more than one table.
The main advantages of star schemas :
Provide a direct and intuitive mapping between the business entities being analyzed by end users and the schema design.
Provides highly optimized performance for typical data warehouse queries. 
Architecture :

Example :

Snowflake Schema
The snowflake schema is a more complex data warehouse model than a star schema, and is a type of star schema. It is called a snowflake schema because the diagram of the schema resembles a snowflake.

May have more than 1 dimension table for each dimension due to the further normalization of each dimension table.  Dimension tables are in the third normal form (3NF).
Snowflake schemas normalize dimensions to eliminate redundancy. That is, the dimension data has been grouped into multiple tables instead of one large table.

For example, a product dimension table in a star schema might be normalized into a Product table but in a snowflake schema it is in normalized into  Product_Category  table , Product_Manufacturer table. While this saves space, it increases the number of dimension tables and requires more foreign key joins.
The result is more complex queries and reduced query performance

Architecture :


Star schema vs. Snowflake schema

Star SchemaSnowflake Schema
UnderstandabilityEasier for business users and analysts to query data.May be more difficult for business users and analysts due to number of tables they have to deal with.
Dimension tableOnly have one dimension table for each dimension that groups related attributes. Dimension tables are not in the third normal form.May have more than 1 dimension table for each dimension due to the further normalization of each dimension table.  Dimension tables are in the third normal form (3NF).
Query complexityThe query is very simple and easy to understandMore complex query due to multiple foreign key joins between dimension tables
Query performanceHigh performance. Database engine can optimize and boost the query performance based on predictable framework.More foreign key joins therefore longer execution time of query in compare with star schema
When to useWhen dimension tables store relative small number of rows, space is not a big issue we can use star schema.When dimension tables store large number of rows with redundancy data and space is such an issue, we can choose snowflake schema to save space.
Foreign Key JoinsFewer JoinsHigher number of joins
Data warehouse systemWork best in any data warehouse / data martBetter for small data warehouse/ data mart

Follow Me by Email

ExcelKingdom-Popular Posts

ExcelKingdom-Random Posts