Segue: Easy cloud hpc in R, now with custom packages

After a few helpful emails from package creator JD Long, I have the segue package running with custom R packages. The package is available on Google code. With two lines of code I can start submitting jobs to very large clusters of computers on the Amazon cloud. For a basic introduction to the package see Jeff Breen’s post.

Quick notes on updating using mercurial: Since I’ve already pulled the code using

 hg clone https://segue.googlecode.com/hg/ segue 

to clone the repo, I did


hg pull https://segue.googlecode.com/hg/ 
hg update
R CMD INSTALL .

Okay, ready to test the segue package. We will load the CRAN package (sde) which automatically loads dependencies as well, and my little custom mcmcTools package, which has no dependencies (these would have to be loaded manually).

[gist id=1070581]

a few notes

  • Be sure to double-check the cluster has shut down on the Amazon web interface. Just exiting R doesn’t mean the cluster closed and they aren’t billing.

  • Note that on launch of the cluster instance we have to pass along all the rObjects we want to have access to on the cluster using the rObjectsOnNodes option in a named list. Failing to do so tripped me up earlier

  • Note that emrlapply wants a list argument,

as.list(1:2)

not a numeric vector,

1:2

Earlier this morning, none of this was possible. JD Long is a genius, and a fast-working one at that. First he added the support for cran packages, and then for custom packages. A few notes: package dependencies must be loaded explicitly. Be patient with the instance start-up, mine took 11 minutes.

Note: post updated 2011-07-08 to reflect JD’s suggestion to use rObjectsOnNodes. now all is working well. Maybe also a thank-you to Google+ for facilitating the last bit o troubleshooting.