I’ve been able to take a break from my regularly scheduled duties and spend some time working with R. This is a short log of what I did to get it working.
The main things I’m looking to do is regression modelling from a large dataset I have in postgresql and various stats calculations on some business data I have in SQL Server. Today I got to the stage in my R learning where I wanted to hook up the databases.
My setup:
- R version 2.12.0 on windows 7
- postgresql 8.4.5 on ubuntu server, requiring SSL
- MS SQL Server 2005 on Windows 2003
R connects to databases via RJDBC, which (surprise) uses JDBC. You need to download JDBC drivers for each server, and then can load those up inside R.
- Install RJDBC
- Open R
- Packages -> Install package(s)
- pick a mirror near you
- select RJDBC
- install JDBC driver for MSSQL
- I used jtds: http://jtds.sourceforge.net/ (there is also a Microsoft provided driver I didn’t hear about until I was done)
- download and unzip
- note the path to the jtds jar file (hereafter referred to as $JTDS and the jar filename
- open http://jtds.sourceforge.net/faq.html#driverImplementation, which has some magic strings JDBC wants
- optional – copy $JTDS/(x64|x86)/SSO/ntlmauth.dll into your %PATH% if you want to use windows authentication with SQL Server
- install JDBC driver for Postgresql
- Download from http://jdbc.postgresql.org/
- note the path to the jar file (hereafter referred to as $PG) and the jar file name
- open http://jdbc.postgresql.org/documentation/head/load.html, which has some magic strings JDBC wants
Then, to connect with MSSQL:
> library(RJDBC) > mssql <- JDBC("net.sourceforge.jtds.jdbc.Driver", "$JTDS/jtds-1.2.5.jar", "`") > testdb <- dbConnect(mssql, "jdbc:jtds:sqlserver://host/dbname") > typeof(dbGetQuery(testdb, "SELECT whathaveyou FROM whither")) [1] "list"
And you’re off and running with a list of your results in a list and can do whatever you like.
Now for postgresql+ssl:
> pgsql <- JDBC("org.postgresql.Driver", "$PG/postgresql-9.0-801.jdbc3.jar", "`") > testdb <- dbConnect(pgsql, "jdbc:postgresql://host/dbname?ssl=true", password="password") > typeof(dbGetQuery(testdb, "SELECT whathaveyou FROM whither")) [1] "list"
The connection here has a lot more options, and depends highly on your server’s pg_hba.conf. It took a little while figure out the “?ssl=true” bit. Luckily you get pretty descriptive error messages if you can’t connect, and the PostgreSQL JDBC docs are pretty good.
Now to re-learn everything I once knew about regression modeling!
One Comment