Please refer to the following link to obtain C. elegans strains:
Caenorhabditis Genetics Center
1. There are seven different types of "Clone" objects in WormBase:
| Type | Nomenclature |
|---|---|
| Cosmid | A*, B*, C*, D*, F*,J*,K*,M*,R*,T*,W*,Z* |
| Fosmid | H* WRM* |
| YAC | Y* |
| cDNA | yk*, EC*, EB*, OST*, CK*, EF*, CEE*, CEM*, CES*, CB*, CN*, cm* |
| Plasmid: PCR clones | V*,EGAP* |
| Other | telo clones, 1 BAC, plasmid |
Most cosmids, fosmids, YACs can be requested from Sanger, cDNA (yk*) from Dr. Yuji Kohara. The EGAP* plasmids can be obtained from MRC Geneservice. The V* plasmids are no longer available.
Whom could I contact about getting a cDNA clone?
All of the cDNA clones with a yk prefix can be ordered by the following method. All other cDNA clones will have to be requested from the submitting party (found by looking at the EMBL/GenBank entry.)
Please go to NextDB(http://nematode.lab.nig.ac.jp), Yuji Kohara cDNA database repository. You can obtain cDNA clones from Yuji Kohara at the National Institute of Genetics, Mishima, Japan: ykohara@LAB.nig.ac.jp
Useful Clone info
Q) How do I find out about the vectors used in the genome sequencing project?
If you want the actual sequence of the vectors used they are on the Sanger FTP site and this should help you identify the vector for the clone you are interested in.
Q) How do I order C. elegans Cosmids Fosmids and Yacs?
Cosmids and Fosmids are available to the community via these routes:
Q) How do I obtain C.elegans cosmids/Yacs? Click for more information
Q) How do I obtain C.elegans fosmid from the Moerman fosmid library? Click for more information
Q) How do I obtain C. elegans fosmid from the Incyte Genomics Inc. fosmid library? Click for more information
We do have information on many thousands of alleles in WormBase. We have also tried to extract the molecular details of the mutations (where known) and add those to WormBase. Some examples:
Go to a gene page: http://www.wormbase.org/db/gene/gene?name=unc-71;class=Locus Then click on the link to the 'ay47' allele (near the bottom of the page), this takes you to: http://www.wormbase.org/db/gene/allele?name=ay47;class=Allele You can see that there is a 'c' to 't' substitution in this gene. If you go to the genome browser display for this gene: http://www.wormbase.org/db/seq/gbrowse/wormbase?name=unc-71 Then turn on the 'SNPs, Knockouts, and other Alleles' track and you will see the positions of the alleles in this gene.
To find other alleles, you can go to the query page: http://www.wormbase.org/db/searches/wb_query and type the following queries (everything between the single quotes): 'Find Allele; Substitution' 'Find Allele; Deletion' 'Find Allele; Insertion'
I'm interested in the CB4858 pas* snp data can I get a bulk download??
A complete dataset of pas snp data is available from Here
Explanation of dataset:
Substitution - the snp sits between Flank1 and Flank2 (gggtAtcg) and this makes up the N2 genomic hit. This is a 1bp feature as the snp is contained in the N2 genomic.
| ID | Type | N2/CB | Chrom | Coordinate1 | Coordinate2 | Flank1 | Flank2 |
|---|---|---|---|---|---|---|---|
| pas10021 | Substitution | A/G | V | 19689782 | 19689782 | cut-aattttgggt | tcgaccttgaaa-cut- |
Deletion - the snp sits between Flank1 and Flank2 (ttttCacacttt) and this makes up the N2 genomic hit. This is a 1bp feature as the snp is contained in the N2 genomic.
| ID | Type | N2 | Chrom | Coordinate1 | Coordinate2 | Flank1 | Flank2 |
|---|---|---|---|---|---|---|---|
| pas44643 | Deletion | C | X | 193667 | 193667 | -cut-aaccattttt | acactttttggctta-cut- |
Insertion - The insertion is in CB4858 so the 2 flanking sequences abut each other (accttaaaaaaaa) and so you get a 2bp feature as the N2 base to the left and right are marked up (Notice the pair of coordinates) In this case, CB4858 has an A between the relative N2 positions 116070 and 116071.
| ID | Type | CB4858 | Chrom | Coordinate1 | Coordinate2 | Flank1 | Flank2 |
|---|---|---|---|---|---|---|---|
| pas44644 | Insertion | A | I | 116070 | 116071 | -cut-aactcaaaacctt | aaaaaaaa-cut- |
The set of clone ends is dumped as part of the gff files:
Here
This is the source for the extents displayed in WormBase.
The caveat with this is that the 'true' end is not marked up for all clones. The early cosmids do not have such annotations because nobody thought about marking them up. Later cosmids do have clone left and right ends as this became part of the standard procedure. Finally, many of the YACs do not have clone ends because the segment submitted to GenBank/EMBL is much smaller than the full clone, and hence the true ends lie within sequences already finished at that stage of the sequencing (i.e. we never went back to update clone ends in sequence already finished).
Our underlying database for WormBase is built on the acedb software (available freely from www.acedb.org). If you have acedb installed locally, you can download the entirety of our database from: ftp://ftp.sanger.ac.uk/pub2/wormbase/live_release
However, a simpler approach may be to just download a GFF file and DNA file for each chromosome from: ftp://ftp.sanger.ac.uk/pub2/wormbase/live_release/CHROMOSOMES/
Where are the flat files for the gene annotation of each chromosome of C. elegans?
You should take a look at the Feature Tables (GFF), which you can pick up from the same 'WormBase Downloads' page where you found the "Summary Tables" (http://www.wormbase.org/downloads.html).
You should also look at the 'Batch Downloads' page at WormBase (http://www.wormbase.org/db/searches/info_dump), where you can build your own tables of gene annotations.
One other WormBase page you should look at is the "Genome Dumper" (http://www.wormbase.org/db/searches/advanced/dumper).
We make WormPep during each release of WormBase and the starting point is always a translation of our latest set of gene predictions. Gene predictions are initially based on the GeneFinder prediction program with human modification as is deemed necessary. The level of human involvement really depends on what other supporting data is available. Aside from routine inspection of gene predictions based on EST/mRNA data we also evaluate our predictions based on information from published papers and direct contact from the worm community. All gene predictions have been looked at by a human to some level.
We have started to distinguish subsets of WormPep. Thus all WormPep proteins can be thought of as either 'CONFIRMED', 'PARTIALLY CONFIRMED', or 'PREDICTED'. The first set contains all genes where there is transcript evidence for every base of every exon of the gene (note that this can still - in theory - mean that there are unpredicted exons in a 'CONFIRMED' gene). The second set contains genes for which there is some transcript evidence but the whole gene is not yet supported...either due to lack of transcript evidence or errors in our current gene prediction. The last set is everything else, i.e. genes with no transcript support. In the future we may expand this classification system to take account of other evidence (e.g. homology info from C. briggsae).
Each new build usually sees a slight increase in the first two categories and a drop in the third category. The relevant status of each Wormpep entry is added into the FASTA header of every entry in each WormPep release.
In early versions of microarray and RNAi libraries, clone and gene names were often used synonymously. Because gene models and names change over versions of WormBase and history has not been carfully preserved, this caused much confusion. Fortunately, for those clones that we have the sequence information, we provide up-to-date mapping from each clone to the current gene models. However, sometimes we don't have sufficient sequence information for a given clone and thus unable to provide any information about its identify and one must inquire the primary generators (corresponding authors of publication) of that clone for more information. Below is an example of a 'lost' clone:
Q) What is the present location of Y41D4A_2491.a?"
Simple answer, there's none. A simple search for "Anything" "Y41D4A_2491.a" produces hits that indicate that Y41D4A_2491.a was used as a clone/gene name in Stanford microarray library. For sequence information, WormBase177 only has sequences for the oligos but not a PCR_product. The pair of oligo sequences (Oligo: sjj_Y41D4A_2491.a_b ; Oligo: sjj_Y41D4A_2491.a_f) fails to produce an ePCR product and each individually fails to map to the genome when searched with Genome browser oligo mapping tool.
We keep uptodate mapping files here ftp://caltech.wormbase.org/pub/annots/rnai/.
Q)I am trying to figure what convention was used when the gene names were changed from a letter code to a number code (for example Y17G9B.a-i to Y17G9B.1-9).
This naming convention change occurred following the initial annotation phase back in the 90's. Genomic clones were originally submitted with cosmid.letter annotations prior to 1998 but this was changed to increase the depth of the nomenclature as some clones started to have more than 26 genes.
There are 3 approaches to identify the current gene that corresponds to an original letter code gene locus.
1) wormpep.history file
Search through the wormpep.history file within the wormpep_package.tar.gz archive.
If you look for your gene eg. Y17G9B.g you get:
Y17G9B.g CE21394 17 18
Then if you look for all occurrences of the CE21388 number.
Y17G9B.g CE21394 17 18
Y17G9B.4 CE21394 18 72
From this you can assume that .g was renamed .4 as the gene encodes the same protein.
This doesn't always work as the gene may have undergone some annotation changes which breaks this link.
2) Blast.
If the above doesn't work and you have a small list you can blast the old peptide against the genome and see which gene it overlaps.
look in the wormpep.fastaXXX file for the peptide sequence (obtained in the package from 1).
grep Y17G9B.a wormpep.history190 as before
Y17G9B.a CE21388 17 18
wormpep.fasta190:
CE21388 MLRLKNFSNLRELSTDS--snip--PVDDLISFLETFELDEEDE
TBLASTN the peptide sequence against the elegans genome and see where it hits the current assembly.
3) Microarray and RNAi libraries
If you are interested in genes used in microarray and RNAi libraries see the previous FAQ.