I’m pretty happy with the way rmarkdown
looks like it can pretty much replace my Makefile approach with a simple R command to rmarkdown::render()
. Notably, a lot of the pandoc configuration can already go into the document’s yaml
header (bibliography, csl, template, documentclass, etc), avoiding any messing around with the Makefile, etc.
Even more exciting is the pending RStudio integration with pandoc. This exposes the features of the rmarkdown
package to the RStudio IDE buttons, but more importantly, seems like it will simplify the pandoc/latex dependency issues cross-platform.
In light of these developments, I wonder if I should separate my manuscripts from their corresponding R packages entirely (and/or treat them as vignettes?) I think it would be ideal to point people to a single .Rmd
file and say “load this in RStudio” rather than passing along a whole working directory.
The rmarkdown::render
workflow doesn’t cover installing the dependencies, or downloading the a pre-built cache. I’ve been relying on the R package mechanism itself to handle dependencies, though I list all packages loaded by the manuscript but not needed by the package functions themselves as SUGGESTS
, as one would do with a vignette. Consequently, I’ve had to add an install.R script to my template, to make sure these packages are installed before a user attempts to run the document. The install script feels like a bit of a hack, and makes me think that RStudio’s packrat may be what I actually want for this. So I finally got around to playing with packrat.
packrat
Packrat isn’t yet on CRAN, and for an RStudio package I admit that it feels a bit clunky still. Having a single packrat.lock
file (think Gemfile.lock
I suppose) seems like a great idea. Carting around the hidden files .Rprofile
, .Renviron
, and the tar.gz
sources for all the dependences (in packrat.sources
) seems heavy and clunky, and logging in and out all the time feels like a hack.
Am I really supposed to commit the
.tar.gz
files? packrat/issues/59 (Summary: option coming)Do I really need to restart R packrat/issues/60 (Summary: yes).
The first discussion led to an interesting question about just how big are CRAN packages these days anyhow? Thanks to this clever rsync
trick from Duncan, I could quickly explore this:
txt = system("rsync --list-only cran.r-project.org::CRAN/src/contrib/ | grep .tar.gz", intern = TRUE)
setAs("character", "num.with.commas", function(from) as.numeric(gsub(",", "", from) ) )
ans = read.table(textConnection(txt), colClasses=c("character", "num.with.commas", "Date", "character"))
ggplot(ans, aes(V3, V2)) + geom_point()
sum(ans$V2>1e6)
sum(ans$V2/1e6)
Note that there are 711 packages over 1 MB, for a total weight of over 2.8 GB. Not huge but more than you might want in a Git repo all the same.
Nevertheless, packrat works pretty well. Using a bit of a hack we can just version manage/ship the packrat.lock
file and let packrat try and restore the rest.
packrat::packify()
source(".Rprofile"); readRenviron(".Renviron")
packrat::restore()
source(".Rprofile"); readRenviron(".Renviron")
The source
/readRenviron
calls should really be restarts to R. Tried replacing this with calls to Rscript -e "packrat::packify()
etc. but that fails to find packrat
on the second call. (Attempting to reinstall it doesn’t work either).
Provided the sources haven’t disappeared from their locations on Github, CRAN, etc., I think this strategy should work just fine. More long-term, we would want to archive a tarball with the packrat.sources
, perhaps downloading it from a script as I currently do with the cache archive.
knitcitations
Debugging check reveals some pretty tricky behavior on R’s part: it wants to check the R code in my vignette even though it’s not building the vignette. It does this by tangling out the code chunks, which ignores in-line code. Not sure if this should be a bug in knitr or R, but it can’t be my fault ;-). See knitr/issues/784
With checks passing, have sent v0.6 to CRAN. Fingers crossed…
Milestones for version 0.7 should be able to address the print formatting issues, hopefully as a new citation_format
option and without breaking backwards compatibility.