The Data Frame in R Programming
A data frame is a Table or a two dimensional array like structure in which each column contains the values of same mode and each row contains the same set of values from each column.
The data frame is a structure in R that holds data and is similar to the datasets found in the standard statistical packages (for example, SAS, SPSS, and Stata). The columns are the variables, and the rows are observations. You can have variables of different types (numeric or character, logical) with each column having same type/mode of data in the data frame. Data frames are the main structures you use to store datasets.A data frame is a Table or a two dimensional array like structure in which each column contains the values of same mode and each row contains the same set of values from each column.
Characteristics of Data frame :
--The column names should not be empty
--The row names should be unique.
--The data stored in the data frame can be of numeric, factor, or character type.
--Each column should have the same mode of data and same number of data items.
Creating a Data frame :
> emp_id<-c(1:5)Creating a Data frame :
> emp_name<-c("Rick","Dan","Michelle","Ryan","Gary")
> salary<-c(624.53,515.25,615.10,725.45,845.25)
> start_date<-as.Date(c("2012-01-01","2013-09-23","2014-11-15","2014-05-11","2015-03-27"))
> empdata<-data.frame(emp_id,emp_name,salary,start_date,stringsAsFactors = FALSE)
> > head(empdata) # same as print(empdata)
emp_id emp_name salary start_date
1 1 Rick 624.53 2012-01-01
2 2 Dan 515.25 2013-09-23
3 3 Michelle 615.10 2014-11-15
4 4 Ryan 725.45 2014-05-11
5 5 Gary 845.25 2015-03-27
'data.frame': 5 obs. of 4 variables:
$ emp_id : int 1 2 3 4 5
$ emp_name : chr "Rick" "Dan" "Michelle" "Ryan" ...
$ salary : num 625 515 615 725 845
$ start_date: Date, format: "2012-01-01" "2013-09-23" "2014-11-15" "2014-05-11" ...
emp_id emp_name salary start_date
1 1 Rick 624.53 2012-01-01
2 2 Dan 515.25 2013-09-23
3 3 Michelle 615.10 2014-11-15
4 4 Ryan 725.45 2014-05-11
5 5 Gary 845.25 2015-03-27
#Getting the Structure of the Data frame :
> str(empdata)'data.frame': 5 obs. of 4 variables:
$ emp_id : int 1 2 3 4 5
$ emp_name : chr "Rick" "Dan" "Michelle" "Ryan" ...
$ salary : num 625 515 615 725 845
$ start_date: Date, format: "2012-01-01" "2013-09-23" "2014-11-15" "2014-05-11" ...
#Summary of data in the Data frame :
The statistical summary and nature of the data can be obtained by applying the summary() function.
> summary(empdata)
emp_id emp_name salary start_date
Min. : 1 Length:5 Min. :515.2 Min. : 2012-01-01
1st Qu.: 2 Class :character 1st Qu.:615.1 1st Qu.: 2013-09-23
Median: 3 Mode :character Median :624.5 Median: 2014-05-11
Mean : 3 Mean :665.1 Mean : 2014-01-14
3rd Qu.: 4 3rd Qu.:725.5 3rd Qu.: 2014-11-15
Max. : 5 Max. :845.2 Max. : 2015-03-27
Extracting data from a Data frame :
emp_id emp_name salary start_date
Min. : 1 Length:5 Min. :515.2 Min. : 2012-01-01
1st Qu.: 2 Class :character 1st Qu.:615.1 1st Qu.: 2013-09-23
Median: 3 Mode :character Median :624.5 Median: 2014-05-11
Mean : 3 Mean :665.1 Mean : 2014-01-14
3rd Qu.: 4 3rd Qu.:725.5 3rd Qu.: 2014-11-15
Max. : 5 Max. :845.2 Max. : 2015-03-27
Extracting data from a Data frame :
We can extract the specific columns data from a data frame using column names as follows.
> result<-data.frame(empdata$emp_name,empdata$salary)
> result<-data.frame(empdata$emp_name,empdata$salary)
> result
empdata.emp_name empdata.salary
1 Rick 624.53
2 Dan 515.25
3 Michelle 615.10
4 Ryan 725.45
5 Gary 845.25
> class(result)
[1] "data.frame"
here, result is another data frame formed from the data frame empdata
#Extracting first 2 observations(rows) from a Data frame :
> result<-emp.data[1:2,]
> result
emp_id emp_name salary start_date
1 1 Rick 624.53 2012-01-01
2 2 Dan 515.25 2013-09-23
1 1 Rick 624.53 2012-01-01
2 2 Dan 515.25 2013-09-23
#Extracting data from 3rd , 5th rows and 2nd,4th columns of a Data frame :
> result<-empdata[c(3,5),c(2,4)]
> result
emp_name start_date
3 Michelle 2014-11-15
5 Gary 2015-03-27
emp_name start_date
3 Michelle 2014-11-15
5 Gary 2015-03-27
Adding new Columns and Rows to a Data frame :
A data frame can be expanded by adding rows and columns to it.
# Adding the "dept" column to the above data frame
> empdata$dept<-c("IT","Operations","IT","HR","Finance")
>print(empdata)
emp_id emp_name salary start_date dept
>print(empdata)
emp_id emp_name salary start_date dept
1 1 Rick 624.53 2012-01-01 IT
2 2 Dan 515.25 2013-09-23 Operations
3 3 Michelle 615.10 2014-11-15 IT
4 4 Ryan 725.45 2014-05-11 HR
5 5 Gary 845.25 2015-03-27 Finance
# To combine a one row to the above data frame
We can combine a row to the Data frame using the rbind() function, which will forms a new data frame with additional row.
> newdata<-rbind(empdata,list(6, "Laura",985.75,"2015-08-15","HR"))
2 2 Dan 515.25 2013-09-23 Operations
3 3 Michelle 615.10 2014-11-15 IT
4 4 Ryan 725.45 2014-05-11 HR
5 5 Gary 845.25 2015-03-27 Finance
# To combine a one row to the above data frame
We can combine a row to the Data frame using the rbind() function, which will forms a new data frame with additional row.
> newdata<-rbind(empdata,list(6, "Laura",985.75,"2015-08-15","HR"))
#Adding the new rows to the above data frame
To add more rows permanently to an existing data frame, we need to bring the new rows in the same structure as the existing data frame and then use rbind () function.
To add more rows permanently to an existing data frame, we need to bring the new rows in the same structure as the existing data frame and then use rbind () function.
Here in the below example we create a new data frame with new rows and then merge with the existing data frame to create the final data frame.
# Creating a new data frame
> emp_id<-c(6:8)
> emp_name<-c("Shreya","Ravi","Teja")
> salary<-c(575.53,725.15,635.10)
> start_date<-as.Date(c("2013-05-25","2013-07-30","2014-06-15"))
> dept<-c("F&A","BI","R&D")
> empnewdata<-data.frame(emp_id,emp_name,salary,start_date,dept,stringsAsFactors = FALSE)
> empnewdata
emp_id emp_name salary start_date dept
1 6 Shreya 575.53 2013-05-25 F&A
2 7 Ravi 725.15 2013-07-30 BI
3 8 Teja 635.10 2014-06-15 R&D
# Bind or combining the new data frame with existing one
1 6 Shreya 575.53 2013-05-25 F&A
2 7 Ravi 725.15 2013-07-30 BI
3 8 Teja 635.10 2014-06-15 R&D
# Bind or combining the new data frame with existing one
> empfinaldata<-rbind(empdata,empnewdata)
> empfinaldata
emp_id emp_name salary start_date dept
1 1 Rick 624.53 2012-01-01 IT
2 2 Dan 515.25 2013-09-23 Operations
3 3 Michelle 615.10 2014-11-15 IT
4 4 Ryan 725.45 2014-05-11 HR
5 5 Gary 845.25 2015-03-27 Finance
1 1 Rick 624.53 2012-01-01 IT
2 2 Dan 515.25 2013-09-23 Operations
3 3 Michelle 615.10 2014-11-15 IT
4 4 Ryan 725.45 2014-05-11 HR
5 5 Gary 845.25 2015-03-27 Finance
6 6 Shreya 575.53 2013-05-25 F&A
7 7 Ravi 725.15 2013-07-30 BI
8 8 Teja 635.10 2014-06-15 R&D
Thanks, Tamatam
No comments:
Post a Comment
Hi User, Thank You for visiting My Blog. Please post your genuine Feedback or comments only related to this Blog Posts. Please do not post any Spam comments or Advertising kind of comments which will be Ignored.