How do I find the ortholog / paralog / etc. of gene X?



We do not explicitly make ortholog assignments in WormBase. This is a non-trivial task and something that we leave to external experts whose results we try to make available. There are several sources that may be useful in WormBase. NCBI COGS, InParanoid, PantherDB, OMA and TreeFam are all programs that attempt to predict orthologous relationships. TreeFam trees (and links) are visible from the gene pages (see cdk-1 page for eg), the ortholog assignments can be found in the ortholog table of the homology widget. The KOGs are found on the respective protein page as part of the eggNOG clusters.
List of available analysis:

    Inparanoid
    PantherDB
    Compara
    OMA
    TreeFam
    Publications



There are also the precomputed BLAST results that are summarised on the gene pages. Each release we also produce a file of best blastp hits for each worm protein which can be found on the FTP site called best_blastp_hits.WSXXX.gz

In addition we include since WS164 predicted orthologue assignemts based on Ensembl COMPARA that cover the whole range of ParaSite nematodes as well as the WormBase species.

How do I get a list of all C. elegans orthologs of H. sapiens disease genes?



One possible solution is to use BioMart from ParaSite, as it includes C.elegans. Go to BioMart and pick Caenorhabdits elegans (homology) and H.sapiens orthologs.

What is the meaning of several abbreviations for proteins that are used by WormBase, like "Protein SW"?



Or "Protein TR" and "Protein WP"? In addition, using the TR Database, sometimes the species origin (e.g., C. elegans) is missing - how can I find out? Furthermore, how can I get from a TR Database entry to the corresponding predicted gene in the C. elegans genome?

SW stands for Swiss-Prot, TR stands for TrEMBL and WP stands for WormPep. In case you're not familiar with any of theses protein databases you can go to: http://www.expasy.org/sprot/ and http://www.sanger.ac.uk/Projects/C_elegans/wormpep/ for an explanation and access to them.

Inside Protein SW or Protein TR, you may find the accession number of Swiss-Prot or TrEMBL. You can get all details of the protein (including species origin..) by going to http://www.expasy.org/sprot/ and entering the accession numbers,


What do those colorful bars for C. briggsae alignments mean?



Dark blue bars are regions of strong similarity. Light blue bars are regions of weak similarity. Dashed areas don't match.

When there are multiple bars in the same region, it means that there are several C. briggsae clones that all match the region.


How can I retrieve the best blast_p scored homologies of worm genes?



(I.e., the homologies produced automatically with each WormBase build -- roughly every 2 months?)

a. Go to the Wormbase ftp site by entering the following URL in your browser: ftp://ftp.wormbase.org/pub/wormbase/releases/current-production-release.

b. Download the two best blastp files in the species/bioproject folder: c_elegans.PRJNA13758.WS253.best_blastp_hits.txt.gz (elegans homolgies) and c_briggsae.PRJNA10731.WS253.best_blastp_hits.txt.gz (briggsae homologies)

c. Unpack the compressed files using a suitable software e.g. gunzip (linux)

d. The files have 15 columns delmited by a comma. The contents of the columns are as follows:

   Column 1: Wormbase peptide accession number for elegans peptide

   Column 2: Wormbase peptide accession number for highest homology elegans peptide

   Column 3: e value for best elegans peptide/worm peptide hit

   Column 4: Ensemble accession number for highest homolgy ensemble sequence

   Column 5: e value for best elegans peptide/ensemble sequence hit

   Column 6: Wormbase peptide accession number for highest homolgy briggsae peptide

   Column 7: e value for best elegans peptide/briggsae peptide hit

   Column 8: Flybase accession number for highest homology fly protein

   Column 9: e value for best elegans peptide/fly protein hit

   Column 10: Saccharomyces Genome Database accession number from highest homology yeast protein

   Column 11: e value for best elegans peptide/yeast protein hit

   Column 12: Swissprot/Uniprot name from highest homology sequence

   Column 13: e value for best elegans peptide/swissprot sequence hit

   Column 14: TrEMBL accession number from highest homology sequence

   Column 15: e value for best elegans peptide/TrEMBL sequence hit



e. You might also want a file that maps Wormbase peptide accession numbers to the corresponding Gene in Wormbase (warning, a single gene may correspond to multiple peptides).
For this you will have to perform an AQL query on Wormbase: - on the banner at the top of the Wormbase homepage select "Searches" - select the top search from the resulting list, "Acedb Searches(AQL)" - copy paste the following text into the search text box:

select a, a->Cgc_name, c from a in class Gene, c in a->Molecular_name where c like "CE*" order by:1 asc

choose the "Text output" radio button and click Query ACeDB(the search may take a few minutes)

The resulting file contains a tab delimited mapping of Wormbase gene accession numbers to the CGC approved name for that gene (if it has one) to the peptide accession number for that gene. save the results file to your hard drive


How can I download the C. elegans-human gene homology map?



You can download a file that lists best blastp match to human, fly, yeast, C. briggsae, SwissProt, and TrEMBL proteins for every C. elegans protein form the wormbase ftp site:

Current Release

The file name is best_blastp_hits.WSXXX.gz where XXX is the release number.


How can I download C. elegans-C.briggsae orthologs and their protein-coding DNA sequences?



One possible way to retrieve those would be to download a C. elegans-C.briggsae ortholog file: [Here]( ftp://ftp.wormbase.org/pub/wormbase/datasets-published/stein_2003/orthologs_and_orphans/orthologs.txt)
and C. briggsae gene sequences in fasta format (briggenes.fa.gz):Here

and write a script that would parse C. briggsae ortholog sequences based on C. elegans gene names.

Another way would be to use WormMart to get a list of genes with orthologs (filter by Homolog/Ortholog -> Homolog[Compara Orholog]). in the Attribute part you can select if you want to have the sequences or just a table of orthologs.


Last edited by Michael Paulini – 25 days ago