Working with SQL data in R using the RODBC package
As response to a question at the recent ML training cohorts, as question was asked by the London cohort of how to directly extract SQL tables into R. This is something I had to do daily when working for my old trust. However, the package in R has many components to it, which can become confusing for a first-time user of the package.
To support this, and to avoid obfuscation as much as possible, the team at D&D have created a shortened helper function (in R) to allow you to quickly connect R to your SQL databases and data warehouses.
Explaining the function
The function is very simple, but extremely useful. This is contained in the code block below and I will talk you through this:
sql_connection <- function(conn_str, sql_str){
library(RODBC)
message("Loading SQL....")
conn <- odbcDriverConnect(connection=conn_str)
sql_command <- sql_str
df <- data.frame(sqlQuery(conn,sql_command))
return(df)
odbcCloseAll()
message("SQL loaded")
}
The function is simple – it looks for a connection string (conn_str) and a SQL command (sql_str) and the following process is initiated:
- The RODBC library is loaded, please make sure this package is installed beforehand
- A message is sent to the Console to say that the SQL has started to load – you could add more functionality to this message, but this will suffice
- A variable is created called conn to initiate the connection and pass the connection string to the connection parameter
- Then a variable to store the sql command is created and links to the user defined function parameter of sql_str
- Next, a data frame is created based on the connection and SQL command
- Following this the data frame is returned from the function
- The last piece in the puzzle is to close the ODBC (Open Database Connectivity) connection and show a message to the user to say that the SQL has been loaded.
So that takes you through what the function actually does. Implementing this function would be as simple as copying the code snippet above, pasting this into R Studio via a text script and then running the function to create this in your environment. The question is then – how do I use it?
Using the function
To use the function we need to call it in our R code and pass two important pieces of information. The first is the connection string, this is the most important part of the function, as an error here will cause the connection not to be established. I provide a brilliant website I used to use when I was a DBA as reference: https://www.connectionstrings.com/sql-server/.
The second part is the SQL command itself. Calling the function is shown in the code block below:
sql_conn <- sql_connection(conn_str="Driver={SQL Server Native Client 11.0};Server=yourserver;Database=yourDB;Trusted_Connection=yes;")
sql_str <- "SELECT your fields FROM your table WHERE some condition"
#Use the function
sql_connection(sql_conn, sql_str)
This will then create a data frame with the results of your SQL query in. Then you can use DPLYR and the Tidyverse to do the relevant data manipulation and cleaning.
I hope you find this function useful and use it at will. It has certainly helped me not have to search the functionality of this package every time.
Thanks for reading!
Gary Hutson – Head of Solutions and AI