Mounsey A et al. (2002) Genome Res "Evidence suggesting that a fifth of annotated Caenorhabditis elegans genes may be pseudogenes."

Hope IA, Mounsey A, Bauer P

[

Genome Res,

2002]

Only a minority of the genes, identified in the Caenorhabditis elegans genome sequence data by computer analysis, have been characterized experimentally. We attempted to determine the expression patterns for a random sample of the annotated genes using reporter gene fusions. A low success rate was obtained for evolutionarily recently duplicated genes. Analysis of the data suggests that this is not due to conditional or low-level expression. The remaining explanation is that most of the annotated genes in the recently duplicated category are pseudogenes, a proportion corresponding to 20% of all of the annotated C elegans genes. Further Support for this Surprisingly high figure was sought by comparing sequences for families of recently duplicated C elegans genes. Although only a preliminary analysis, clear evidence for a gene having been recently inactivated by genetic drift was found for many genes in the recently duplicated category. At least 4% of the annotated C elegans genes call be recognized as pseudogenes simply from closer inspection of the sequence data. Lessons learned in identifying pseudogenes in C elegans could be of value in the annotation of the genomes of other species where, although there may be fewer pseudogenes, they may be harder to detect.