How to Import the data from an Excel(.xlsx, xls) file in R
We can read the data from the Excel(.xlsx,.xls) files into R using the "readxl" pcakage.
The "readxl package", developed by Hadley Wickham, can be used to easily import Excel files (xls|xlsx) into R without any external dependencies.
The readxl package comes with the function read_excel() to read xls and xlsx files
It’s also possible to choose a file interactively using the function file.choose().
The readxl package makes it easy to get data out of Excel and into R. Compared to many of the existing packages (e.g. gdata, xlsx, xlsReadWrite) readxl has no external dependencies, so it’s easy to install and use on all operating systems. It is designed to work with tabular data.
The readxl supports both the legacy .xls format and the modern xml-based .xlsx format. The libxls C library is used to support .xls, which abstracts away many of the complexities of the underlying binary format. To parse .xlsx, we use the RapidXML C++ library.
readxl Package vs. xlsx Package:
The xlsx package read (and write) .xlsx and .xls files using Java. It is cross platform and uses rJava to deal with Java.
install.packages("xlsx")
library(xlsx)
We can read the data from the Excel files using the read.xlsx() and read.xlsx2() functions available in the "xlsx" package library.
The read.xlsx() function may be slow, when opening large Excel files. It preserves source data type, and tries to guess class type of the variable corresponding to each column in the worksheet.
The read.xlsx2() function is considerably faster, but does not guess the vector class of data frame columns.
read.xlsx(“filename.xlsx”, 1) reads your file and makes the data frame column classes nearly useful, but is very slow for large data sets.
read.xlsx2(“filename.xlsx”, 1) is faster, but you will have to define column classes manually. Character specification converts your columns to factors. Use Date and POSIXct options for time.
Now we will use the "readxl" package in the following scenario.The easiest way to install the latest released version from CRAN.
install.packages("readxl")
Now we will read the data from this Workbook in R as follows..
> filepath<-"T:/My_Tech_Lab/R_Lab/Dataframes/Dataset1.xlsx"
#Reading an Excel, without specifying the Sheet name(by default First sheet data will be read)
> xldat<-read_excel(filepath,col_names = T)
#Reading a specific Range of data from a Sheet
> xldat<-read_excel(filepath,sheet="RgnSales",range="A1:E11",col_names = T)
OR
> xldat<-read_excel(filepath,range="RgnSales!A1:E11",col_names = T)
#Reading a specific Rows of data from a Sheet
> xldat<-read_excel(filepath,sheet="RgnSales",range=cell_rows(1:6),col_names = T)
#Reading a Missing Values ( any thing like "(blank)","na","NA","NULL") as NA
> xldat<-read_excel(filepath,sheet="dView1",col_names = T,na=c("","na","NA","NULL"))
Writing Data from Dataset to Excel :
Now we write the data from the dataset "xldat" to the below output location :
>opath<- "T:/My_Tech_Lab/R_Lab/Dataframes/Output/"
>write.xlsx(xldat,file = paste(opath,"dsOutput.xlsx",sep=""), sheetName = "SalesView", append = FALSE)
We can read the data from the Excel(.xlsx,.xls) files into R using the "readxl" pcakage.
The "readxl package", developed by Hadley Wickham, can be used to easily import Excel files (xls|xlsx) into R without any external dependencies.
The readxl package comes with the function read_excel() to read xls and xlsx files
It’s also possible to choose a file interactively using the function file.choose().
The readxl package makes it easy to get data out of Excel and into R. Compared to many of the existing packages (e.g. gdata, xlsx, xlsReadWrite) readxl has no external dependencies, so it’s easy to install and use on all operating systems. It is designed to work with tabular data.
The readxl supports both the legacy .xls format and the modern xml-based .xlsx format. The libxls C library is used to support .xls, which abstracts away many of the complexities of the underlying binary format. To parse .xlsx, we use the RapidXML C++ library.
readxl Package vs. xlsx Package:
The xlsx package read (and write) .xlsx and .xls files using Java. It is cross platform and uses rJava to deal with Java.
install.packages("xlsx")
library(xlsx)
We can read the data from the Excel files using the read.xlsx() and read.xlsx2() functions available in the "xlsx" package library.
The read.xlsx() function may be slow, when opening large Excel files. It preserves source data type, and tries to guess class type of the variable corresponding to each column in the worksheet.
The read.xlsx2() function is considerably faster, but does not guess the vector class of data frame columns.
read.xlsx(“filename.xlsx”, 1) reads your file and makes the data frame column classes nearly useful, but is very slow for large data sets.
read.xlsx2(“filename.xlsx”, 1) is faster, but you will have to define column classes manually. Character specification converts your columns to factors. Use Date and POSIXct options for time.
Now we will use the "readxl" package in the following scenario.The easiest way to install the latest released version from CRAN.
install.packages("readxl")
library("readxl")
Examples :
Suppose I have an excel file(Dataset1.xlsx) with multiple sheets follows..
> filepath<-"T:/My_Tech_Lab/R_Lab/Dataframes/Dataset1.xlsx"
#Reading an Excel, without specifying the Sheet name(by default First sheet data will be read)
> xldat<-read_excel(filepath,col_names = T)
#Reading a specific sheet data from Excel, by specifying the Sheet Index/name
> xldat<-read_excel(filepath,sheet=6,col_names = T)
OR
> xldat<-read_excel(filepath,sheet="dView2",col_names = T)
OR
> xldat<-read_excel(filepath,sheet="dView2",col_names = T)
#Reading a specific Range of data from a Sheet
> xldat<-read_excel(filepath,sheet="RgnSales",range="A1:E11",col_names = T)
OR
> xldat<-read_excel(filepath,range="RgnSales!A1:E11",col_names = T)
#Reading a specific Columns of data from a Sheet
> xldat<-read_excel(filepath,sheet="RgnSales",range=cell_cols("A:E"),col_names = T)#Reading a specific Rows of data from a Sheet
> xldat<-read_excel(filepath,sheet="RgnSales",range=cell_rows(1:6),col_names = T)
#Reading a Missing Values ( any thing like "(blank)","na","NA","NULL") as NA
> xldat<-read_excel(filepath,sheet="dView1",col_names = T,na=c("","na","NA","NULL"))
Writing Data from Dataset to Excel :
We can write the data from a Dataframe to Excel using xlsx package in R.
There are two main functions in xlsx package for writing both xls and xlsx Excel files:
There are two main functions in xlsx package for writing both xls and xlsx Excel files:
write.xlsx() and write.xlsx2() [faster on big files compared to write.xlsx function].
>install.packages("xlsx")
>library("xlsx")
Now we write the data from the dataset "xldat" to the below output location :
>opath<- "T:/My_Tech_Lab/R_Lab/Dataframes/Output/"
>write.xlsx(xldat,file = paste(opath,"dsOutput.xlsx",sep=""), sheetName = "SalesView", append = FALSE)
--------------------------------------------------------------------------------------------------------
Thanks, TAMATAM ; Business Intelligence & Analytics Professional
--------------------------------------------------------------------------------------------------------
Thanks, TAMATAM ; Business Intelligence & Analytics Professional
--------------------------------------------------------------------------------------------------------
No comments:
Post a Comment
Hi User, Thank You for visiting My Blog. Please post your genuine Feedback or comments only related to this Blog Posts. Please do not post any Spam comments or Advertising kind of comments which will be Ignored.