[
Genome Res,
2002]
Only a minority of the genes, identified in the Caenorhabditis elegans genome sequence data by computer analysis, have been characterized experimentally. We attempted to determine the expression patterns for a random sample of the annotated genes using reporter gene fusions. A low success rate was obtained for evolutionarily recently duplicated genes. Analysis of the data suggests that this is not due to conditional or low-level expression. The remaining explanation is that most of the annotated genes in the recently duplicated category are pseudogenes, a proportion corresponding to 20% of all of the annotated C elegans genes. Further Support for this Surprisingly high figure was sought by comparing sequences for families of recently duplicated C elegans genes. Although only a preliminary analysis, clear evidence for a gene having been recently inactivated by genetic drift was found for many genes in the recently duplicated category. At least 4% of the annotated C elegans genes call be recognized as pseudogenes simply from closer inspection of the sequence data. Lessons learned in identifying pseudogenes in C elegans could be of value in the annotation of the genomes of other species where, although there may be fewer pseudogenes, they may be harder to detect.