Bioconductor Packaging: Lessons Learned

1. Having non-standard library locations is a pain
If you don't know, R functionality is generally enabled by loading packages, self-contained folders of function definitions. Many packages depend on others, so to enable functionality contained in one package, you have to load a bunch of others. This normally isn't a big problem, unless you store your installed packages in a non-default location. For some practical reasons, this was my situation. Normally, it is not a big deal, except that the same files that tell R where to find the packages are not necessarily read when "building" and "checking" the built package. Best way to avoid this problem: either install the packages to the default location, or define the package locations "R_LIBS" in the "Renviron.site" file in R_HOME/etc/.

2. Long examples will really slow things down.
The Bioconductor guidelines suggest that running "R CMD check" shouldn't take longer than 4 or 5 minutes, including running examples and running the code in the package vignette. If you have examples that take any length of time to run, or the code in your vignette takes any length of time, you will quickly run over time. I got around this by pre-generating the stuff that took a long time and saving it in the package data.

Which actually brings up another point. I originally did not do this due to issues that R 2.12 had with remembering the classes of objects when I would reload them from the associated data file. This does not seem to be an issue with R 2.14.1 or 2.15 (current dev version).

And that brings up the issue of size. If I do the basic "build" process, my package size is ~2MB, but there is an option "--resave-data" to compress the associated data files more than the default.

Do you have any thing you've learned the hard way about writing R or Bioconductor packages?

No comments:

Post a Comment