working with R, postgresql + SSL, and MSSQL

I’ve been able to take a break from my regularly scheduled duties and spend some time working with R.  This is a short log of what I did to get it working.

The main things I’m looking to do is regression modelling from a large dataset I have in postgresql and various stats calculations on some business data I have in SQL Server.  Today I got to the stage in my R learning where I wanted to hook up the databases.

My setup:

  • R version 2.12.0 on windows 7
  • postgresql 8.4.5 on ubuntu server, requiring SSL
  • MS SQL Server 2005 on Windows 2003

R connects to databases via RJDBC, which (surprise) uses JDBC.  You need to download JDBC drivers for each server, and then can load those up inside R.

  1. Install RJDBC
    1. Open R
    2. Packages -> Install package(s)
    3. pick a mirror near you
    4. select RJDBC
  2. install JDBC driver for MSSQL
    1. I used jtds: http://jtds.sourceforge.net/ (there is also a Microsoft provided driver I didn’t hear about until I was done)
    2. download and unzip
    3. note the path to the jtds jar file (hereafter referred to as $JTDS and the jar filename
    4. open http://jtds.sourceforge.net/faq.html#driverImplementation, which has some magic strings JDBC wants
    5. optional – copy $JTDS/(x64|x86)/SSO/ntlmauth.dll into your %PATH% if you want to use windows authentication with SQL Server
  3. install JDBC driver for Postgresql
    1. Download from http://jdbc.postgresql.org/
    2. note the path to the jar file (hereafter referred to as $PG) and the jar file name
    3. open http://jdbc.postgresql.org/documentation/head/load.html, which has some magic strings JDBC wants

Then, to connect with MSSQL:

> library(RJDBC)
> mssql <- JDBC("net.sourceforge.jtds.jdbc.Driver", "$JTDS/jtds-1.2.5.jar", "`")
> testdb <- dbConnect(mssql, "jdbc:jtds:sqlserver://host/dbname")
> typeof(dbGetQuery(testdb, "SELECT whathaveyou FROM whither"))
[1] "list"

And you’re off and running with a list of your results in a list and can do whatever you like.

Now for postgresql+ssl:

> pgsql <- JDBC("org.postgresql.Driver", "$PG/postgresql-9.0-801.jdbc3.jar", "`")
> testdb <- dbConnect(pgsql, "jdbc:postgresql://host/dbname?ssl=true", password="password")
> typeof(dbGetQuery(testdb, "SELECT whathaveyou FROM whither"))
[1] "list"

The connection here has a lot more options, and depends highly on your server’s pg_hba.conf.  It took a little while figure out the “?ssl=true” bit.  Luckily you get pretty descriptive error messages if you can’t connect, and the PostgreSQL JDBC docs are pretty good.

Now to re-learn everything I once knew about regression modeling!