!!WARNING!!


WormMart has been replaced by WormMine and so updated documentation is on its way


Is there a way for a large number of genes to get not just the alleles, but also the actual mutation (when known)?

Here's what you need to do:

Open WormMart
SELECT RELEASE, SELECT DATABASE, SELECT DATASET [Variation]
Click Filters in the left menu, then expand the Other Annotation filter and under Annotated with select [Sequence] Flanking Sequence
Upload your file of GeneIDs to the Specified identifier of type field.
Hit Count then Results to check your file is being read. nb. if you have chosen WS176 you should get n/80791.
Click Attributes in the left menu and under Identification select Variation (Name), Method, Variation Type (merged) and Mutation Type. Under Affects select Gene (WB Gene ID) (merged), Gene (CGC name) and under Description select Sense, Sense Text, Splice site, Splice Site Text and Frameshift
Hit Results 

How can I retrieve 1.5 kb promoter region upstream of a bunch of genes?

Open WormMart at http://www.wormbase.org/biomart/martview/
SELECT RELEASE, SELECT DATABASE, SELECT DATASET [Gene]
Click Filters in the left menu, then expand the "Identification" filter, tick "[Gene] ID(s) of Type", select "[Gene] Any Name", upload (using "Browse") or type in a list of genes
Hit Count to check your file is being read correctly
Click Attributes in the left menu, tick "Gene Sequences", expand "Sequence Type", tick "Flank (Gene Coding Region)" (for upstream of translation start site), expand "Flanking Regions", tick "Upstream flank", type 1500 in the box
Hit Results
Results can be exported or e-mailed 

How can I find all genes expressed somewhere with a particular GO term?

(E.G., Find all genes expressed in the Vulva that have signal transducer activity?)

Firstly you need to identify the exact ontology identifiers that correspond to you query. Seach for the term in the main search box on the home page with the appropriate category selected.

vulva - WBbt:0006748

signal transducer activity - GO:0004871

Armed with these you can start your WormMart query.

Select which version of the database you are interested in (current release or a recent frozen one)
Select 'Gene' DATASET
In the left panel click 'Filters'. This will give an expandable list of filters in the main window.
Expand 'Annotation'
Check the ' [Annotation] IDs of Type: ' box and select '[Function] GO term ID' from the dropdown menu
put the GO term found earlier (GO:0004871) in to the box (or upload from file). (NOTE:You must include the GO: part) 

This will find all genes annotated with GO term GO:0004871 - hit count for results so far (at time of writing = 111) Now we will add a second dataset to cross reference this result with.

Back in the left panel, click 'Dataset' to get a drop down menu of other datasets.
Select 'Expression Pattern'
In the left panel, under 'Dataset' there is another 'Filters' option. Click to get a similar list as above.
Select ' Expressed in '
Check ' Specified identifiers of type ' box and choose ' Anatomy Term [eg WBbt:0006748]' from drop down list.
Enter the Anatomy term found earlier (or upload from file). (NOTE:You must include the WBbt: part) 

This completes the querying part, you now need to select what information you want about the genes that the search finds.

Click the 'Attributes' section in the left panel and expand the boxes as you need to select output categories of the Gene.
Click the 'Attributes' under 'Datasets' section to select attributes to do with the expression pattern. 

This link goes to the completed query. You can click on the relevant sections as described above to change any of GO or Anatomy terms and output data. You may need to click 'Results' to see the output of the query. How do I pull out all operon details and the names of genes contained in an Operon?

You can retrieve this data through WormMart

Select which version of the database you are interested in (current release or a recent frozen one)
Select 'Gene' DATASET
In the left panel click 'Filters'. This will give an expandable list of filters in the main window. 

Select these Filters:

[Gene] Species : Caenorhabditis elegans [Gene] Status : Live [Location] : Operon:Only (Annotation Tab - Limit to Entries Annotated with:)

In the left panel click 'Attributes'
In the right panel select these Attributes: 

Gene WB ID [IDs tab] Operon [Location tab] Operon Start [Location tab] Operon End [Location tab]

This will give you a table like#:

Gene WB ID  Gene Public Name  Operon    Operon Start (bp)  Operon End (bp)
WBGene00000001  aap-1             CEOP1906  5106224            5111008
WBGene00000037  ace-3             CEOP2632  14197942       14210076
WBGene00000038  ace-4             CEOP2632  14197942       14210076

How do you retrieve all the protein sequences of genes within Operons?

You can retrieve this data through WormMart

Select which version of the database you are interested in (current release or a recent frozen one)
Select 'Gene' DATASET
In the left panel click 'Filters'. This will give an expandable list of filters in the main window. 

Select these Filters:

[Gene] Species : Caenorhabditis elegans
[Gene] Status : Live
[Location] : Operon:Only (Annotation Tab - Limit to Entries Annotated with:)


In the left panel click 'Attributes'
In the right hand window click 'Gene Sequences' 

Select These Attributes:

Sequence Type:
 Peptide
Header Attributes
 Gene WB ID
 Gene Public Name
 WB Wormpep ID

This should give you ~2800 on count and results like:

Click on 'Results' 

> WBGene00000814|csn-2|WP:CE27562
MGDEYMDDDEDYGFEYEDDSGSEPDVDMENQYYTAKGLRSDGKLDEAIKSFEKVLELEGE
KGEWGFKALKQMIKITFGQNRLEKMLEYYRQLLTYIKSAVTKNYSEKSINAILDYISTSR
QMDLLQHFYETTLDALKDAKNERLWFKTNTKLGKLFFDLHEFTKLEKIVKQLKVSCKNEQ
GEEDQRKGTQLLEIYALEIQMYTEQKNNKALKWVYELATQAIHTKSAIPHPLILGTIREC
GGKMHLRDGRFLDAHTDFFEAFKNYDESGSPRRTTCLKYLVLANMLIKSDINPFDSQEAK
PFKNEPEIVAMTQMVQAYQDNDIQAFEQIMAAHQDSIMADPFIREHTEELMNNIRTQVLL
RLIRPYTNVRISYLSQKLKVSQKEVIHLLVDAILDDGLEAKINEESGMIEMPKNKKKMMV
TSLVVPNAGDQGTTKSDSKPGTSSEPSTTTSVTSSILQGPPATSSCHQELSMDGLRVWAE
RIDSIQSNIGTRIKF*
etc. etc.

How do I retrieve a list of transcription factors which when mutated or targeted by RNAi cause embryonic lethal phenotype?

Open WormMart
Select Database, e.g. "WormBase Release 188"
Select Dataset - "Phenotype"
Click on "Filter" link on the left and then on the "+" next to "Phenotype Annotation"
Select "Phenotype Inc. Descendents" checkbox and select "embryonic_lethal" from the pull down menu 

If you click "Count" button at this point, you should see the number of entries that are annotated with this phenotype (this is not necessary).

Now add a second dataset:

Click on second "Dataset" link on the left
Choose Additional Dataset - "Gene"
Click on "Filter" link on the left (for the second dataset) and then on the "+" next to "Annotation"
Select "[Annotation] IDs of Type" checkbox, select "[Function] GO Term ID" and enter GO:0003700 in the box below (corresponds to transcription factor activity)
Click on "Attributes" link on the left (for the second dataset) and then on the "+" next to "Function" and select "GO Term Info (merged)" checkbox (if you want to see GO annotations in addition to attributes selected by default, which you can change for each dataset through the Attributes dialog)
Press Results button and Export all results to File (also check Unique results only) 

Here is what you should see.

How do I download/generate a file containing the unspliced transcripts like I see on the sequence pages of WormBase?

I would like to download the sequences that I see on the sequence summary pages eg.

Image:Unspliced_transcript.jpg

To do this, replicate the following wormmart query.

Dataset:    CHOOSE DATABASE: WormBase WS195
             CHOOSE DATASET: Gene

Filters leave as default and add (*)

   [Gene] Species : Caenorhabditis elegans<input type="hidden" name="default____wormbase_gene__filterlist" value="wormbase_gene__filter.species_selection"></input>

   [Gene] Status : Live

    * Annotation: [Transcript] Type: Coding

Attributes: select [Gene Sequences] at top of page

    Sequence Type: Unspliced (Transcript)

    Header Attributes: Whatever the user requires

This should give you a count in the region of 22,000 objects and yield ~27,000 sequence objects in your file.

If you have a specific list of genes you want sequence data for, you can upload a file of IDs. e.g. WBGene IDs file format is:

WBGene00000001
WBGene00000002
WBGene00000003

Go back to your wormmart session and on the filters tab select ([Annotation] IDs of Type:) and upload your file.

WormBase will provide a pre-computed file under the sequence directory on the ftp site:

ftp://ftp.sanger.ac.uk/pub/wormbase/live_release/genomes/c_elegans/sequences/dna/

Which GFF source and feature (method) should I use?

The terms feature and method are used interchangably GFF_source_methods


Last edited by Paul Davis – 4 years ago