A full and complete description of all C. elegans genes is not known (and may not accurately be known for many years). WormBase attempts to represent all genes that have good experimental evidence plus a number of genes which have less experimental evidence but which were generated using gene finding software. If there are any publically available transcript data (EST, mRNA etc.) then WormBase should nearly always have attempted to make a gene prediction in that region. However, many poorly expressed genes may not have any transcript evidence and so may not be represented in WormBase at this time. Please help us by letting us know if you have any evidence for a gene that is currently not displayed in WormBase. Aside from transcript evidence (for which we would always encourage people to submit to GenBank/EMBL/DDBJ) a strong case can be made for creating a new gene if there is good conservation with other species (particularly C. briggsae or C. remanei) and if there is other supporting data (such as a positive RNAi phenotype).
Please also note that your gene may be there but may not be represented in the standard set of tracks in the Genome Browser. Check alternative gene predictions by turning on tracks for the 'GeneFinder' and 'Twinscan' gene predictions. Also consider turning on the 'Obsolete gene models' track as the gene may have existed in WormBase in the past but may have been removed.
Please send the new transcript sequence with a brief description of the required gene model change to help@wormbase.org and a curator will make the appropriate change.
Please also submit your sequence to the EMBL/GenBank/DDBJ database. This helps in the confirmation and evidence for the wormbase gene prediction as we automatically retrieve sequence data from the public databases. This also makes the data public, allowing appropriate reference and acknowledgement to yourself.
One approach is to write your own scripts in Perl using the Bio::Graphics modules that are part of BioPerl.
* confirmed_est - an intron confirmed by EST transcript sequence data
* confirmed_cdna - an intron confirmed by cDNA/mRNA transcript sequence data
* confirmed_inconsistent - means that a curator has decided that the intron doesn't fit with what we
consider to be a valid transcript or there appears to be something wrong with the Transcript that confirms the intron.
* confirmed_false - these are where curators have confirmed that the confirmed intron is false or artefactual.
* confirmed_UTR - used when a confirmed intron looks like it is in the UTR of a gene.
* confirmed_Homology - where protein homology looks to confirm the intron, this has seen limited use.
* seg - low complexity regions e.g. homopolymer runs - explanation
* signalp - predicts the presence and location of signal peptide cleavage sites Emanuelsson et al 2007
* tmhmm - predicts transmembrane helices in proteins Koghet al 2001