Sunday, January 28, 2018

What is Data Frame and How to Create a Data Set in R Programming

The Data Frame in R Programming
A data frame is a Table or a two dimensional array like structure in which each column contains the values of same mode and each row contains the same set of values from each column.
The data frame is a structure in R that holds data and is similar to the datasets found in the standard statistical packages (for example, SAS, SPSS, and Stata). The columns are  the variables, and the rows are observations. You can have variables of different types (numeric or character, logical) with each column having same type/mode of data in the data frame. Data frames are the main structures you use to store datasets.

Characteristics of Data frame :
--The column names should not be empty
--The row names should be unique.
--The data stored in the data frame can be of numeric, factor, or character type.
--Each column should have the same mode of data and same number of data items.

Creating a Data frame :
> emp_id<-c(1:5)
> emp_name<-c("Rick","Dan","Michelle","Ryan","Gary")
> salary<-c(624.53,515.25,615.10,725.45,845.25)
> start_date<-as.Date(c("2012-01-01","2013-09-23","2014-11-15","2014-05-11","2015-03-27"))

> empdata<-data.frame(emp_id,emp_name,salary,start_date,stringsAsFactors = FALSE)
> head(empdata) # same as  print(empdata)
   emp_id emp_name    salary    start_date
1     1             Rick        624.53   2012-01-01
2     2             Dan         515.25   2013-09-23
3     3            Michelle   615.10   2014-11-15
4     4            Ryan        725.45   2014-05-11
5     5            Gary        845.25    2015-03-27


#Getting the Structure of the Data frame :
> str(empdata)
'data.frame': 5 obs. of 4 variables:
$ emp_id : int 1 2 3 4 5
$ emp_name : chr "Rick" "Dan" "Michelle" "Ryan" ...
$ salary : num 625 515 615 725 845
$ start_date: Date, format: "2012-01-01" "2013-09-23" "2014-11-15" "2014-05-11" ...

#Summary of data in the Data frame :
The statistical summary and nature of the data can be obtained by applying the summary() function.
> summary(empdata)
emp_id       emp_name               salary                start_date
Min.     :  1   Length:5                 Min.   :515.2       Min.    : 2012-01-01
1st Qu.:  2   Class :character     1st Qu.:615.1     1st Qu.: 2013-09-23
Median:  3   Mode  :character    Median :624.5    Median: 2014-05-11
Mean   :  3                                  Mean   :665.1    Mean   : 2014-01-14
3rd Qu.:  4                                  3rd Qu.:725.5    3rd Qu.: 2014-11-15
Max.   :   5                                   Max.   :845.2     Max.    : 2015-03-27 

Extracting data from a Data frame :
We can extract the specific columns data from a data frame using column names as follows. 
> result<-data.frame(empdata$emp_name,empdata$salary)
> result
       empdata.emp_name   empdata.salary
1               Rick                         624.53
2               Dan                         515.25
3               Michelle                   615.10
4               Ryan                       725.45
5               Gary                        845.25

> class(result)
[1] "data.frame"
here, result is another data frame formed from the data frame empdata

#Extracting first 2 observations(rows) from a Data frame :
> result<-emp.data[1:2,]
> result
   emp_id emp_name    salary    start_date
1     1             Rick        624.53   2012-01-01
2     2             Dan         515.25   2013-09-23

#Extracting data from 3rd , 5th rows and 2nd,4th columns of a Data frame :
> result<-empdata[c(3,5),c(2,4)]

> result
       emp_name     start_date
3       Michelle       2014-11-15
5       Gary            2015-03-27

Adding new Columns and Rows to a Data frame :
A data frame can be expanded by adding rows and columns to it.
# Adding the "dept" column to the above data frame
> empdata$dept<-c("IT","Operations","IT","HR","Finance")
>print(empdata)
   emp_id emp_name    salary    start_date     dept
1     1             Rick        624.53   2012-01-01     IT
2     2             Dan         515.25   2013-09-23    Operations
3     3            Michelle   615.10   2014-11-15     IT
4     4            Ryan        725.45   2014-05-11     HR
5     5            Gary        845.25    2015-03-27    Finance


# To combine a one row to the above data frame
We can combine a row to the Data frame using the rbind() function, which will forms a new data frame with additional row.
> newdata<-rbind(empdata,list(6, "Laura",985.75,"2015-08-15","HR"))

#Adding the new rows to the above data frame
To add more rows permanently to an existing data frame, we need to bring the new rows in the same structure as the existing data frame and then use rbind () function.
Here in the below example we create a new data frame with new rows and then merge with the existing data frame to create the final data frame.

# Creating a new data frame
> emp_id<-c(6:8)
> emp_name<-c("Shreya","Ravi","Teja")
> salary<-c(575.53,725.15,635.10)
> start_date<-as.Date(c("2013-05-25","2013-07-30","2014-06-15"))
> dept<-c("F&A","BI","R&D")
> empnewdata<-data.frame(emp_id,emp_name,salary,start_date,dept,stringsAsFactors = FALSE)
> empnewdata

  emp_id   emp_name salary    start_date     dept
1      6        Shreya       575.53   2013-05-25  F&A
2      7        Ravi           725.15   2013-07-30   BI
3      8        Teja           635.10    2014-06-15  R&D

# Bind or combining the new data frame with existing one
> empfinaldata<-rbind(empdata,empnewdata)
> empfinaldata
   emp_id emp_name    salary    start_date     dept
1     1             Rick        624.53   2012-01-01     IT
2     2             Dan         515.25   2013-09-23    Operations
3     3            Michelle   615.10   2014-11-15     IT
4     4            Ryan        725.45   2014-05-11     HR
5     5            Gary         845.25   2015-03-27    Finance
6     6            Shreya     575.53   2013-05-25    F&A
7     7            Ravi         725.15   2013-07-30    BI
8     8            Teja          635.10   2014-06-15    R&D


Thanks, Tamatam

No comments:

Post a Comment

Hi User, Thank You for visiting My Blog. Please post your genuine Feedback or comments only related to this Blog Posts. Please do not post any Spam comments or Advertising kind of comments which will be Ignored.

Featured Post from this Blog

How to compare Current Snapshot Data with Previous Snapshot in Power BI

How to Dynamically compare two Snapshots Data in Power BI Scenario: Suppose, we have a sample Sales data, which is stored with Monthly Snaps...

Popular Posts from this Blog