Gene Ontology flattened

In our weekly journal club last week, we looked at an interesting method for discovering genes that are related to one another functionally (link). One of the things they did in the paper was to use
A flattened representation of the GO hierarchy ... and stores the annotations as Boolean arrays in which the presence and absence of annotations is recorded (Huang et al., 2007). This representation implicitly contains the ontological relations and allows the inclusion of non-ontological annotations as part of the array. This avoids the inference of relationships through the hierarchical structure of GO.

This is relatively easy to do using the GO.db database in Bioconductor:

This will generate a list structure, each of which has a logical vector that indicates the presence or absence of each GO ID. Note that this takes advantage of the "GO2ALLEGS" table in the organism database, that has the GO annotations for all the genes based on annotation to any ancestor GO IDs as well as direct annotation. It would be easy to verify that this does match the flattened representation by getting the direct annotation, and then using "GOBPANCESTOR" to generate a full annotation list. Remember, a gene is indirectly annotated to all the ancestor terms in the GO directed acyclic graph. "GO2ALLEGS" is the easiest way to get this information that I know of in Bioconductor.


R NameSpaces and Classes

I have been developing a Bioconductor package as part of my research at UofL, and a lot of it depends on other classes from another package. The classes in the other package had a function call as part of their initialization, that tended to break whenever these classes were extended as part of new classes.

So in R 2.13.0, it wasn't too hard to get around it:

This generates the following error:

This is due to how the "HyperGParams" class is defined. This happens if you try and extend any of the Category classes, but there is a workaround: give a valid "annotation" to the prototype:

Now, this will work in R 2.13.0, and it will work in R 2.14.1, but what if you want to combine this class with another in a new class?

Note that "GOHyperGParams" also has the "annotation" slot initialized to something useful to avoid the error above.
Now what happens in R 2.14.1?

We get the same error as before! Why?

In R 2.13.0, we were actually creating a new definition of "GOHyperGParams" in the local workspace. However, in 2.14.1, we are no longer allowed to create a duplicate class with the same name. Therefore, to modify it we need to explicitly modify the copy in the original package "Category", like so:

A final twist for embedding this in our new package, is that we have to explicitly import the classes from the other package using "ImportClassFrom" in the NAMESPACE file:

This was much looser in previous implementations, and I'm guessing this is going to help make coding better for developers in R as they have to be a little more explicit about what is going on. It has taken some time for me to learn, as I have only one programming course way back in C, and no formal OO, everything I have learned has been in writing my own package for Bioconductor.


RStudio New Features!

For anyone who is using RStudio, there are some new features to note in the next release (0.95).

1 - Multiple projects, multiple RStudio instances

If you are like me and often have large things running, and want to work on something else, then it is nice to be able to have multiple instances (copies if you will) of the editor and programming environment running. I often find myself working on multiple projects, and I don't like shutting down and coming back to something when all I really need is 10 minutes to work on the other thing. So I was very happy to see that you can now fire up multiple copies of RStudio. You can even tie particular instances of RStudio to a particular project, with its associated files, and history.

2 - Integrated version control

I admit, I don't use version control nearly as much as I should, I still tend to depend on keeping old bits of code in files and then running the pieces that I need. But version control has gotten a lot easier with integrated versioning in RStudio using Git or SVN. I had been using Mercurial, but due to the built in integration with Git, am switching over (not hard, I only had like one directory that was actually using vc). What is really sweet about it, is that you can just create a project based on a current directory, and say you want to use VC, and it will initialize it and restart RStudio, and you are good to go.
One reason I will be using Git over SVN is the ability to stick with local repositories, whereas SVN would require a whole server set up on my machine or somewhere else. And it seems pretty easy to use.

As of 1500 EST on 20/01/12, the download page still showed the old version, but the documentation for version control is up. If you want to get the version with projects and version control as a preview release here http://www.rstudio.org/download/preview.  

Edit: The new version is available on the main download page.