Gene Ontology flattened

In our weekly journal club last week, we looked at an interesting method for discovering genes that are related to one another functionally (link). One of the things they did in the paper was to use
A flattened representation of the GO hierarchy ... and stores the annotations as Boolean arrays in which the presence and absence of annotations is recorded (Huang et al., 2007). This representation implicitly contains the ontological relations and allows the inclusion of non-ontological annotations as part of the array. This avoids the inference of relationships through the hierarchical structure of GO.

This is relatively easy to do using the GO.db database in Bioconductor:

This will generate a list structure, each of which has a logical vector that indicates the presence or absence of each GO ID. Note that this takes advantage of the "GO2ALLEGS" table in the organism database, that has the GO annotations for all the genes based on annotation to any ancestor GO IDs as well as direct annotation. It would be easy to verify that this does match the flattened representation by getting the direct annotation, and then using "GOBPANCESTOR" to generate a full annotation list. Remember, a gene is indirectly annotated to all the ancestor terms in the GO directed acyclic graph. "GO2ALLEGS" is the easiest way to get this information that I know of in Bioconductor.

No comments:

Post a Comment