linux

working with R, postgresql + SSL, and MSSQL

I’ve been able to take a break from my regularly scheduled duties and spend some time working with R.  This is a short log of what I did to get it working.

The main things I’m looking to do is regression modelling from a large dataset I have in postgresql and various stats calculations on some business data I have in SQL Server.  Today I got to the stage in my R learning where I wanted to hook up the databases.

My setup:

  • R version 2.12.0 on windows 7
  • postgresql 8.4.5 on ubuntu server, requiring SSL
  • MS SQL Server 2005 on Windows 2003

R connects to databases via RJDBC, which (surprise) uses JDBC.  You need to download JDBC drivers for each server, and then can load those up inside R.

  1. Install RJDBC
    1. Open R
    2. Packages -> Install package(s)
    3. pick a mirror near you
    4. select RJDBC
  2. install JDBC driver for MSSQL
    1. I used jtds: http://jtds.sourceforge.net/ (there is also a Microsoft provided driver I didn’t hear about until I was done)
    2. download and unzip
    3. note the path to the jtds jar file (hereafter referred to as $JTDS and the jar filename
    4. open http://jtds.sourceforge.net/faq.html#driverImplementation, which has some magic strings JDBC wants
    5. optional – copy $JTDS/(x64|x86)/SSO/ntlmauth.dll into your %PATH% if you want to use windows authentication with SQL Server
  3. install JDBC driver for Postgresql
    1. Download from http://jdbc.postgresql.org/
    2. note the path to the jar file (hereafter referred to as $PG) and the jar file name
    3. open http://jdbc.postgresql.org/documentation/head/load.html, which has some magic strings JDBC wants

Then, to connect with MSSQL:

> library(RJDBC)
> mssql <- JDBC("net.sourceforge.jtds.jdbc.Driver", "$JTDS/jtds-1.2.5.jar", "`")
> testdb <- dbConnect(mssql, "jdbc:jtds:sqlserver://host/dbname")
> typeof(dbGetQuery(testdb, "SELECT whathaveyou FROM whither"))
[1] "list"

And you’re off and running with a list of your results in a list and can do whatever you like.

Now for postgresql+ssl:

> pgsql <- JDBC("org.postgresql.Driver", "$PG/postgresql-9.0-801.jdbc3.jar", "`")
> testdb <- dbConnect(pgsql, "jdbc:postgresql://host/dbname?ssl=true", password="password")
> typeof(dbGetQuery(testdb, "SELECT whathaveyou FROM whither"))
[1] "list"

The connection here has a lot more options, and depends highly on your server’s pg_hba.conf.  It took a little while figure out the “?ssl=true” bit.  Luckily you get pretty descriptive error messages if you can’t connect, and the PostgreSQL JDBC docs are pretty good.

Now to re-learn everything I once knew about regression modeling!

code snippet
linux
mssql
open source
postgresql
R
windows

Comments (0)

Permalink

qpsmtpd-forkserver on debian

I had some problems getting qpsmtpd-forkserver to run on debian. The punchline: qpsmtpd-forkserver relies on an environment variable called QPSMTPD_CONFIG, which should point to the directory containing your config files.

$ export QPSMTPD_CONFIG=/etc/qpsmtp
$ qpsmtpd-forkserver ...

Problem solved.

I didn’t want to run qpsmtpd for its standard usage (wrapping a mail server on the same machine), I just wanted a mail proxy that ran some code whenever any email came in, and then forward the email to a real mail server running on a different machine. qpsmtpd has a great plugin system, and was the path of least resistance. However, Debian’s qpsmtpd package is setup to wrap another mail server running on the same machine, and its init scripts configure the QPSMTPD_CONFIG variable for you. This did not help when I tried to start qpsmtpd-forkserver on the command line. I eventually figured it out by reading through the perl code for qpsmtpd, and then the init script itself.

To add insult to injury, qpsmtpd-forkserver had no way to pass the config on the command line.

I don’t know if this behavior is debian-specific, or just an oversight by the qpsmtpd maintainers.

At any rate, I was able to look through the code and figure it out, which is more than I can say for my latest .NET problem. The solution to that one was to speed my move to a new machine.

linux
open source

Comments (0)

Permalink

To Linux / Open Source Advocates

I just read another linux advocacy article off of reddit, in this case Five Reasons why Linux will eventually rule the world, and it hit a lot of my pet peeves about these kinds of articles. In a nutshell, for users, all that matters is that their work gets done, and all other arguments are wasted bits.

I have a few suggestions for open source / linux advocates:

  1. Don’t mention Microsoft
    Just don’t do it. My friend Nathan has a set of trolls, and regularly downmods anything referring to those in any setting (slashdot, reddit, what-have-you), and for you linux advocates, Microsoft should enter your troll-sphere. I’ve heard that any press is good press, and while that might not be true (ask Bush about Iraq), I do know that zero press is zero press. I also believe that branding-advertiso-brainwashing works (ask deBeers), so do everyone a favor and not mention the “competition”. At this point the fact that Microsoft exists is almost a non-issue to you. You solve all your problems without Microsoft products, so they are about as relevant to you as vitameatavegamin.
  2. Don’t mention intangibles
    For most people, the choice of operating system or office suite has nothing to do with freedom, morality, or elegance of the code. It’s a non-issue for non-techies, and a dead end for advocates. Technical folks can look all that up, and understand the advantages / disadvantages, but you don’t need to convince them. You need to convince the non-technical CEOs and managers who believe that using open source takes money from their pocket. I run into people in the course of my job who have little to no to false understanding of what “open source” really means. I’d guess half the CEOs in the country think that their codebase is their competitive advantage, and if they employ open source then competitors will steal their business. They will not be convinced by tirades about liberty.
  3. Blog your solutions
    Get a blogger account, and start blogging how you solved your problems using linux or open source. Please, don’t plan to run your own server from your apartment with your hand-rolled Erlang blog engine, cause you’ll never get to it. Just go ahead and start the blogger account with the crappy template that doesn’t use CSS classes the way you’d prefer. When people google to solve their problem, you want there to be multitudes of open source solutions clogging the results. This mostly already happens, but more content can’t hurt, and I’m curious how many dead accounts blogger will keep hosting.
  4. Appeal to the wallet
    In case you haven’t realized it, money makes the world go round. That will remain the case until the either the zombies or the aliens arrive, and then we’ll have a brief respite while exchange rates re-adjust. When people ask why they should use open source, the answer needs to be “because it saves you money”. Last week alone using Firefox with Firebug saved me probably 12 hours worth of time in debugging javascript and reverse-engineering colors and styling from a mock-up. That sort of productivity gain gives managers and CEOs a high equivalent to 3 whippets. Time and cost savings need to be a main point in any article designed to convince someone to switch.
  5. Focus on the user benefit
    Users care about how they benefit. That’s it. I feel this Gimp vs Photoshop article is a good example of open source advocacy. It’s focused on what benefits differentiate the two, and concludes:

    I know I’ve beat the horse to death, but unless you want to pirate software, there is no reason to use Photoshop if you’re not producing a print publication – use the Gimp.

    That is the essential message to send: “Unless you want to (increase cost | increase risk | break the law | behave irrationally), use the open source equivalent”.

  6. Be rational
    Lastly, and most importanly, you have to be rational. Avoid (or at least acknowledge) logical fallacies, call spades spades, and admit weaknesses where you believe them to be. Don’t speak as if you are the end-all, be-all authority. Computing is ridiculously varied, and your solution might not work for others for a wide variety of legitimate reasons. Your mileage will always vary. Don’t be whiny. I read many posts about linux, and sometimes all I can hear is Luke Skywalker whining “But I was going into Tosche Station to pick up some power converters…”. It doesn’t matter if its unfair, it doesn’t matter if its right, it doesn’t matter if its monopolist, don’t whine. Open source is going to take over the world, but not because M$ sux0rs or the mafiaa doesn’t want you to watch DVDs.

Advocates, next time you want to rant about Microsoft doing something bad for humanity, take some time and additionally post how you solve a day-to-day problem. Open source and linux will win because they will be the path of least resistance, and as advocates, it’s our job to make sure the path of least resistence is well-lit.

linux
open source

Comments (5)

Permalink