Tuesday, November 13, 2018

How to Merge or Join the two Dataframes in R using base Merge Function

Merge Function to Join or merge the two Dataframes in R
We can merge two data frames (datasets) horizontally, using the Merge function of the R's "base" package. In most cases, you join two data frames by one or more common key variables (i.e., an inner join) that common in both the datasets.

Also separate specifications of the columns can be given by by.x and by.y. Columns can be specified by name, number or by a logical vector: the name "row.names" or the number 0 specifies the row names.

The rows in the two data frames that match on the specified columns are extracted, and joined together. If there is more than one match, all possible matches contribute one row each.
If the remaining columns in the data frames have any common names, these have ".x" and ".y" appended to make the names of the result unique.


Examples :
> library(base)
> fpath1<-"T:/My_Tech_Lab/R_Lab/Dataframes/Lt_dataset.txt"
> fpath2<-"T:/My_Tech_Lab/R_Lab/Dataframes/Rt_dataset.txt"

> Left_dt<-read.csv(fpath1,header=T,sep="\t",as.is=T)
> Right_dt<-read.csv(fpath2,header=T,sep="\t",as.is=T
)

We have the two data sets(say Left_dt, Right_dt) as follows..


Now we can merge these two datasets based on a common key column "SO_Id" as follows...

#Merging only matching rows ( like Inner Join in SQL )
> ds1<-merge(Left_dt,Right_dt,by="SO_Id",all=F,sort=T,no.dups=T)
Suppose, if the common key Variable names are different(Obs are similar) in both the data sets, then we have to specify their names in by.x and by.y as follows..
ds1<-merge(Left_dt,Right_dt,by.x="SO_Id",by.y="Order_Id",all=F,sort=T,no.dups=T) 

Output :

#Merging all rows (Matching and Non matching rows from both the datasets, like Full Outer Join in SQL )
> ds1<-merge(Left_dt,Right_dt,by="SO_Id",all=T,sort=T,no.dups=T)
Output :

#Merging all rows from Left dataset and the matching rows from Right dataset(like Left Outer Join in SQL )
> ds1<-merge(Left_dt,Right_dt,by="SO_Id",all.x=T,sort=T,no.dups=T)

Output :


#Merging all rows from Right dataset and the matching rows from Left dataset(like Right Outer Join in SQL )
> ds1<-merge(Left_dt,Right_dt,by="SO_Id",all.y=T,sort=T,no.dups=T)
Output :

Note:
We can perform these Merge, Joining operations more efficiently by using the Functions from the "dplyr" package of R, which we will discuss in later posts.
Now we will see what will happen, if we use the rbind() and cbind() functions on these data sets.

> ds1<rbind(Left_dt,Right_dt)
Error in rbind(deparse.level, ...) :   numbers of columns of arguments do not match

here, we got an error as the no.of Variables(Columns) Left_dt is not matching with Right_dt.
> ds1<-cbind(Left_dt,Right_dt)
Error in data.frame(..., check.names = FALSE) :
arguments imply differing number of rows: 20, 18

again here, we got an error as the no.of Observations(Rows) Left_dt is not matching with Right_dt.

--------------------------------------------------------------------------------------------------------
Thanks, TAMATAM ; Business Intelligence & Analytics Professional
--------------------------------------------------------------------------------------------------------

No comments:

Post a Comment

Hi User, Thank You for visiting My Blog. Please post your genuine Feedback or comments only related to this Blog Posts. Please do not post any Spam comments or Advertising kind of comments which will be Ignored.