[
International Worm Meeting,
2017]
The C. briggsae reference gene set was created in 2003 using a combination of several ab initio gene prediction programs. The quality of the predictions was adequate but not good, as there is very little EST data in C. briggsae to improve the gene structures. Recently we have used RNA-seq paired-end data together with the program Trinity to assemble artificial full-length transcripts. These transcripts were aligned to the reference genome and used to improve the quality of the gene structures. Gene structures were created from the Trinity transcripts and each was manually assessed to confirm that they were suitable for use as gene annotation. About a third of the existing gene structures were already correct, a third have been changed and a third are either expressed at too low a level to determine the correct structure, or are damaged by poor quality assembly in that region, or might be pseudogenes, or otherwise have proved intractable. About 200 C. elegans genes had their structures corrected based on homology evidence from the new C. briggsae structures. Several interesting genes with unusual conserved structures have been discovered during the process of manual assessment of the gene set. Among these are two genes using a putative non-canonical initiation codon (CBG08614, F26D11.1 and CBG20564, C37C3.8). There are about 40 genes where the locus produces two non-overlapping isoforms but they do not conform to the normal operon structure. There are genes with an isoform that stops halfway along the locus (CBG23530a/b, E02A10.2a/c). There are genes within another gene in the same sense in an intron (CBG30841, Y53F48B.42)