Scientific | Review |
GOToolbox is centered around the idea that counting gene occurrences within gene ontology classes says something useful about the function of the gene. Altered classes are respectively called 'Enriched' and 'Depleted' classes. After an introduction, in which many readers might find valuable information (for instance, it was unknown to me that gene ontology classes exist on a per-organism basis), the paper jumps directly into the first tool: GO term statistics. These are calculated according to a hyper-geometric distribution, which in summary calculates the probability that we measure #x genes belong to class A on a micro-array, while the total organism gene ontology only provides #y genes. This is of course weighted against the total ontology class and the total size of the micro-array. This idea of using a hyper-geometric distribution is cool and a first experiment on a data-set we had lying around revealed that it was not so wrong in the reported classes. On the other hand, a closer inspection of the results showed that they were very general and many of the known functions of our particular gene were not detected.
A second read of the paper revealed some more flaws. For instance: the micro-array slide design is not taken into account and this is probably the biggest missing feature from this algorithm. If we would for instance use a micro-array than only measures signal transduction genes, then of course we would find an enrichment of this category, while most other categories would be depleted.
Another tool included in the paper is a distance measurement between two genes based on counting the number of genes that belong to similar classes. Again, the mathematics are beautiful and clear. The rest of the paper gives some results based on existing tools and wrap up the paper in a scientific manner.
In summary, this was a nice paper, aside from the one huge error - not taking the micro-array design into account - we enjoyed reading it. The source code is supposed to be GPL, but we could not obtain it as of this writing (August 2008). The author can also not be contacted at his regular email-address. The online software (at http://burgundy.cmmt.ubc.ca/GOToolBox/) on the other hand works quite nicely.