Software Carpentry

An article in this weeks edition of Nature on providing source code for journal articles that depend on new or original computer programs to analyze data (link) led to the discovery of two new resources:

1 - An article on scientists ability to write code (link)

2 - An actual course focused on teaching scientists how to write computer code, known as "Software Carpentry". The materials for the course are posted on-line, with full lecture content, as well as videos. If all you have is a basic introduction to programming, this might be useful.

My own programming experience, I have one intro to programming course from my Masters, and then did a lot of Matlab programming during my PhD (including GUI development), and then learned R, object-oriented and R packages during my PostDoc, and I think I am going to work through this course. I have finally been implementing unit-tests and using version control, but I am sure there are lessons to be learned from this course.


Debugging R using "recover"

I just discovered today that one way to easily insert oneself into a mis-performing function in R is to use "recover", via options(error=recover).

This allows you to enter any of the functions involved in the error, at the point where the error occurred. Hope someone else finds this useful.


Celebration of teaching and learning

The Delphi center at UofL puts on a teaching conference I believe each year, and this year the focus is on the incorporation of digital media into class instruction. Some trepidation regarding some of the language (wide vs deep reading).

Have been excited by a lot of the stuff being talked about, however. Check out @DelphiCelebrate, and #CLbrT2012 for some of the twitter chatter.


Bioconductor Packaging: Lessons Learned

1. Having non-standard library locations is a pain
If you don't know, R functionality is generally enabled by loading packages, self-contained folders of function definitions. Many packages depend on others, so to enable functionality contained in one package, you have to load a bunch of others. This normally isn't a big problem, unless you store your installed packages in a non-default location. For some practical reasons, this was my situation. Normally, it is not a big deal, except that the same files that tell R where to find the packages are not necessarily read when "building" and "checking" the built package. Best way to avoid this problem: either install the packages to the default location, or define the package locations "R_LIBS" in the "Renviron.site" file in R_HOME/etc/.

2. Long examples will really slow things down.
The Bioconductor guidelines suggest that running "R CMD check" shouldn't take longer than 4 or 5 minutes, including running examples and running the code in the package vignette. If you have examples that take any length of time to run, or the code in your vignette takes any length of time, you will quickly run over time. I got around this by pre-generating the stuff that took a long time and saving it in the package data.

Which actually brings up another point. I originally did not do this due to issues that R 2.12 had with remembering the classes of objects when I would reload them from the associated data file. This does not seem to be an issue with R 2.14.1 or 2.15 (current dev version).

And that brings up the issue of size. If I do the basic "build" process, my package size is ~2MB, but there is an option "--resave-data" to compress the associated data files more than the default.

Do you have any thing you've learned the hard way about writing R or Bioconductor packages?