We do not explicitly make ortholog assignments in WormBase. This is a non-trivial task and something that we leave to external experts whose results we try to make available. There are several sources that may be useful in WormBase. NCBI COGS, InParanoid, PantherDB, OMA and TreeFam are all programs that attempt to predict orthologous relationships. TreeFam trees (and links) are visible from the gene pages (see cdk-1 page for eg), the ortholog assignments can be found in the ortholog table of the homology widget. The KOGs are found on the respective protein page as part of the eggNOG clusters.
List of available analysis:
Inparanoid
PantherDB
Compara
OMA
TreeFam
Publications
There are also the precomputed BLAST results that are summarised on the gene pages. Each release we also produce a file of best blastp hits for each worm protein which can be found on the FTP site called best_blastp_hits.WSXXX.gz
In addition we include since WS164 predicted orthologue assignemts based on Ensembl COMPARA that cover the whole range of ParaSite nematodes as well as the WormBase species.
One possible solution is to use BioMart from ParaSite, as it includes C.elegans. Go to BioMart and pick Caenorhabdits elegans (homology) and H.sapiens orthologs.
Or "Protein TR" and "Protein WP"? In addition, using the TR Database, sometimes the species origin (e.g., C. elegans) is missing - how can I find out? Furthermore, how can I get from a TR Database entry to the corresponding predicted gene in the C. elegans genome?
SW stands for Swiss-Prot, TR stands for TrEMBL and WP stands for WormPep. In case you're not familiar with any of theses protein databases you can go to: http://www.expasy.org/sprot/ and http://www.sanger.ac.uk/Projects/C_elegans/wormpep/ for an explanation and access to them.
Inside Protein SW or Protein TR, you may find the accession number of Swiss-Prot or TrEMBL. You can get all details of the protein (including species origin..) by going to http://www.expasy.org/sprot/ and entering the accession numbers,
Dark blue bars are regions of strong similarity. Light blue bars are regions of weak similarity. Dashed areas don't match.
When there are multiple bars in the same region, it means that there are several C. briggsae clones that all match the region.
(I.e., the homologies produced automatically with each WormBase build -- roughly every 2 months?)
a. Go to the Wormbase ftp site by entering the following URL in your browser: ftp://ftp.wormbase.org/pub/wormbase/releases/current-production-release.
b. Download the two best blastp files in the species/bioproject folder: c_elegans.PRJNA13758.WS253.best_blastp_hits.txt.gz (elegans homolgies) and c_briggsae.PRJNA10731.WS253.best_blastp_hits.txt.gz (briggsae homologies)
c. Unpack the compressed files using a suitable software e.g. gunzip (linux)
d. The files have 15 columns delmited by a comma. The contents of the columns are as follows:
Column 1: Wormbase peptide accession number for elegans peptide Column 2: Wormbase peptide accession number for highest homology elegans peptide Column 3: e value for best elegans peptide/worm peptide hit Column 4: Ensemble accession number for highest homolgy ensemble sequence Column 5: e value for best elegans peptide/ensemble sequence hit Column 6: Wormbase peptide accession number for highest homolgy briggsae peptide Column 7: e value for best elegans peptide/briggsae peptide hit Column 8: Flybase accession number for highest homology fly protein Column 9: e value for best elegans peptide/fly protein hit Column 10: Saccharomyces Genome Database accession number from highest homology yeast protein Column 11: e value for best elegans peptide/yeast protein hit Column 12: Swissprot/Uniprot name from highest homology sequence Column 13: e value for best elegans peptide/swissprot sequence hit Column 14: TrEMBL accession number from highest homology sequence Column 15: e value for best elegans peptide/TrEMBL sequence hit
e. You might also want a file that maps Wormbase peptide accession numbers to the corresponding Gene in Wormbase (warning, a single gene may correspond to multiple peptides).
For this you will have to perform an AQL query on Wormbase: - on the banner at the top of the Wormbase homepage select "Searches" - select the top search from the resulting list, "Acedb Searches(AQL)" - copy paste the following text into the search text box:
select a, a->Cgc_name, c from a in class Gene, c in a->Molecular_name where c like "CE*" order by:1 asc
choose the "Text output" radio button and click Query ACeDB(the search may take a few minutes)
The resulting file contains a tab delimited mapping of Wormbase gene accession numbers to the CGC approved name for that gene (if it has one) to the peptide accession number for that gene. save the results file to your hard drive
You can download a file that lists best blastp match to human, fly, yeast, C. briggsae, SwissProt, and TrEMBL proteins for every C. elegans protein form the wormbase ftp site:
Current Release
The file name is best_blastp_hits.WSXXX.gz where XXX is the release number.
One possible way to retrieve those would be to download a C. elegans-C.briggsae ortholog file:
[Here](
ftp://ftp.wormbase.org/pub/wormbase/datasets-published/stein_2003/orthologs_and_orphans/orthologs.txt)
and C. briggsae gene sequences in fasta format (briggenes.fa.gz):Here
and write a script that would parse C. briggsae ortholog sequences based on C. elegans gene names.
Another way would be to use WormMart to get a list of genes with orthologs (filter by Homolog/Ortholog -> Homolog[Compara Orholog]). in the Attribute part you can select if you want to have the sequences or just a table of orthologs.