[
International Worm Meeting,
2009]
Since the publication of the C. briggsae genome annotation in 2003 [1], not much improvement has been done, although accumulating evidence suggests that many gene models are inaccurately predicted or missing. In this project, we have reannotated the C. briggsae genome, exploiting the much improved C. elegans genome annotation (using WS195, compared to WS77 annotation used for the original C. briggsae annotation), as well as a new homology-based gene finder we have developed, genBlastG. genBlastG builds on our recently published program genBlastA [2] and takes as input a query protein sequences and a genome sequence that will be annotated to produce all homologous gene models. Our analysis suggests that genBlastA outperforms GeneWise in both processing time (on average genBlastG runs ~50 times faster than GeneWise) and accuracy. We applied genBlastG to reannotate the C. briggsae genome. Our preliminary results from genBlastG produced 16,954 homologous models with the majority (11,235) matching well with the current WormBase annotation. However, a significant number (4,828) of genBlastG models exhibit better percent identity (PID) to the query protein sequence, the C. elegans query sequences. Thus, genBlastG models shows better homology to C. elegans models for many genes. In addition to better homology, our predictions also points out 261 WormBase models that should be split and 298 pairs of models should be merged. As an example of a model that should be merged, we found that CBG14800 and CBG14801 may actually be one gene that''s orthologous to C54G7.3a. CBG14801 may only represent the shorter isoform that''s orthologous to C54G7.3b. As an example of a model that should be split, we found that CBG00366 consists of orthologs from ZK550.3 and ZK550.4. In this presentation, I will summarize all improvement suggested by genBlastG. genBlastG will also be applied to predict gene models in other Caenorhabditis species. 1.Stein, L.D., et al., (2003). The genome sequence of Caenorhabditis briggsae: a platform for comparative genomics. PLoS Biol 1: E45. 2.She, R., et al., (2009). GenBlastA: enabling BLAST to identify homologous gene sequences. Genome Res 19: 143-9.