Friday, November 23, 2018

How to Connect and Read data from SQL Server Databse in R

How to Connect to a RDBMS using ODBC Package in R
The "odbc" package (developed by Jim and other members of the RStudio team), and the "DBI" package (developed by Kirill Müller with support from the R Consortium) provides a modern database interface for R
To communicate with databases, a common protocol is ODBC. ODBC is an open, cross-platform standard for interfacing with databases, whatever the database software and variant of the SQL language it implements.

R has also the "RODBC" package created by Brian Ripley, but the new "odbc" package provides an updated alternative.The "odbc" package is an ODBC interface for R that provides native support for additional data types (including dates, timestamps, raw binary, and 64-bit integers) and parameterized queries. 

The "odbc" package provides a DBI-compliant interface to Open Database Connectivity (ODBC) drivers. This allows for an efficient, easy to setup connection to any database with ODBC drivers available, including SQL Server, Oracle, MySQL, PostgreSQL, SQLite and others. The implementation builds on the nanodbc C++ library.
Benchmarks show that "odbc" is somewhat faster than "RODBC": 
3.2 times faster for reads, and 1.9 times faster for writes.

Connection Settings Syntax :
The following is the basic syntax of a connection to a odbc () supported RDBMS.>dbCon <- DBI::dbConnect(odbc::odbc(),
                  Driver = "[Driver Name]",
                  Server = "[Server Name]",
                  Database = "[Databse Name]",
                  UID = rstudioapi::askForPassword("Database User Name"),
                  PWD = rstudioapi::askForPassword("Database Password"),
   
              Port = 1433)

Here, we use the rstudioapi::askForPassword to enter values for UID, PWD during run time, instead of hardcoding the values.
Now we discuss about Connecting to SQL Server Database, Reading and Writing the Data from R.

Installing Required Packages :
odbc - This package provides the interface between the database driver and R.
DBI - This package provides the standardized functions related to database operations.

dbplyr - This package enables "dplyr" to interact with databases. It also contains the vendor-specific SQL translations.

>install.packages("odbc")
>install.packages("DBI")
>install.packages("dbplyr")
>install.packages("dplyr")


>library(odbc)
>library(DBI)
>library(dbplyr)
>library(dplyr)


Connecting to the SQL Server Database :
We will use the dbConnect() function from the "DBI" package to connect to the database.
>dbCon <- dbConnect(odbc::odbc(),
                                     Driver = "SQL Server",
                                     Server = "TAMATAM-PC",
                                     Database = "AnalyticsDB",
                                     Trusted_Connection=TRUE,
                                     UID = "TAMATAM-PC\\T P Reddy",
                                     PWD = "",
                                     Port = 1433
                                    )
Once we create a connection to the Database, you see the connection details in R Studio, with the list of Databases that you have access to.
 

#Disconnecting Server connection :
In R studio, we can disconnect from Connection section of the Environment Window as shown below.
Also we can disconnect using the below function.
> dbDisconnect(conn = con)

#Re-connecting to Server again:
We can reconnect to Server from the Connections Window in R Studio, by simply copying the below R Script and Run.
Or, You can connect by choosing the option from "Connect>R Console", then R will write the Connection string to the R Console, executes and establish the Connection again.
> library(odbc)
> dbCon <- dbConnect(odbc::odbc(), Driver = "SQL Server", Server = "TAMATAM-PC",
Database = "AnalyticsDB", Trusted_Connection = TRUE, UID = "TAMATAM-PC\\T P Reddy", PWD = "", Port = 1433)


Reading the Data from SQL Server Database :
Now can Read from SQL Server Database tables into R using various functions discussed below.
#List all the Tables in the SQL Server Connection (Includes all DBs Tables)
The dbListTables () function of "DBI" Package will list all the user, system tables from all the available Databases in the Connection "dbCon" , created in the above.
>library(DBI)
> dbListTables(dbCon)
> dbListTables(conn = dbCon )



#List all the user Tables from the specific Database(say AnalyticsDB) in SQL Server
> userTables <- dbGetQuery(dbCon, "Select * From [AnalyticsDB].Information_Schema.Tables Where Table_Type = 'BASE TABLE'")
>userTables

> userTables <- dbGetQuery(dbCon, "Select * From  [AnalyticsDB].Sys.objects 
Where Type = 'U' ")
>userTables

The dbGetQuery() will submit a query and fetch the results.We can also submit the query and fetch separately with dbSendQuery() and dbFetch().

The dbSendQuery() method only submits and synchronously executes the SQL query to the database engine. It does not extract any records, for that you need to use the dbFetch() method, and then you must call dbClearResult() when you finish fetching the records you need. For interactive use, you should almost always prefer dbGetQuery().
> userTables <- dbSendQuery(dbCon, "Select * From  [AnalyticsDB].Information_Schema.Tables Where Table_Type = 'BASE TABLE' ")

> myTables<-dbFetch(userTables)
>myTables

>dbClearResult(userTables)
Notes :
We should use dbSendQuery() and dbFetch() one after another. The n= argument in the function dbFetch() can be used to fetch only the part of a query result.
> dbFetch(userTables,n=10)
Then next n rows can fetched using dbFetch(userTables).
If we again run the above functions, we will get the warning massages as follows.
> userTables <- dbSendQuery(dbCon, "Select * From [AnalyticsDB].Information_Schema.Tables Where Table_Type = 'BASE TABLE' ")

Warning message:
In new_result(connection@ptr, statement) : Cancelling previous query

> dbFetch(userTables)
[1] TABLE_CATALOG TABLE_SCHEMA TABLE_NAME TABLE_TYPE
<0 rows> (or 0-length row.names).


>dbClearResult(userTables)

# List Tables starts with "Dim%" from the [AnalyticsDB]
> dbListTables(dbCon, table_name = "Dim%")
[1] "DimCustomers" "DimProducts" "DimRegion" "DimSuppliers"

# List the Filed Names of the Table "DimCustomers" from the [AnalyticsDB]
> dbListFields(dbCon, "DimCustomers")
[1] "Cust_Id" "Cust_Name" "Cust_Segment"


#Read full table into R as a Dataframe
>library(DBI)> dfCust<-dbReadTable(dbCon, "DimCustomers")

> dfCust

We can read a Table using the function tbl() of "dplyr" package
>library(dplyr)
> dfCust<-tbl(dbCon, "DimCustomers")> dfCust

#Counting the Rows of Table using the function tally() of "dplyr" package
> tally(tbl(dbCon, "DimCustomers"))


> x<-tally(tbl(dbCon, "DimCustomers"))
> show_query(x)
<SQL> SELECT COUNT(*) AS "n" FROM "DimCustomers"

Reading Data from a Join Query Result into R :
>library(DBI)
>JoinData <- dbGetQuery(dbCon,
"SELECT [SalesOrder_Id]
     ,[Order_Date]
     , D.[FiscalQuarter]
      ,R.[Region_Name]
      ,R.[Country]
      ,C.[Cust_Segment]
      ,C.[Cust_Name]
      ,P.[Prod_Name]
      ,[Net_Sales]
  FROM [dbo].[FactSales] S
     INNER JOIN
  [dbo].[Tbl_Calendar] D  ON S.[Order_Date]=D.[Date_Id]
     INNER JOIN
  [dbo].[DimRegion] R  ON S.[SalesRegion_Id]=R.[Region_Id]
     INNER JOIN
  [dbo].[DimProducts] P  ON S.[Prod_Id]=P.[Prod_Id]
     INNER JOIN
  [dbo].[DimCustomers] C  ON S.[Cust_Id]=C.[Cust_Id] "
)

>JoinData



Writing Dataframe to SQL Server as Table :
Now we will write the above Dataframe "JoinData" to the SQL Server as a Table with the same Name.
>library(DBI)
>dbWriteTable(dbCon, "JoinData", JoinData,overwrite=TRUE,append=FALSE)

Output:
Now we will see the table as JoinData in the SQL Server database [AnalyticsDB]



#--------------------------------------------------------------Thanks--------------------------------------------------------------#

No comments:

Post a Comment

Hi User, Thank You for visiting My Blog. Please post your genuine Feedback or comments only related to this Blog Posts. Please do not post any Spam comments or Advertising kind of comments which will be Ignored.