From WormBaseWiki
Release Notes
WS174 was built by Gary Williams
======================================================================
This directory includes:
i) database.WS174.*.tar.gz - compressed data for new release
ii) models.wrm.WS174 - the latest database schema (also in above database files)
iii) CHROMOSOMES/subdir - contains 3 files (DNA, GFF & AGP per chromosome)
iv) WS174-WS173.dbcomp - log file reporting difference from last release
v) wormpep174.tar.gz - full Wormpep distribution corresponding to WS174
vi) wormrna174.tar.gz - latest WormRNA release containing non-coding RNA's in the genome
vii) confirmed_genes.WS174.gz - DNA sequences of all genes confirmed by EST &/or cDNA
viii) cDNA2orf.WS174.gz - Latest set of ORF connections to each cDNA (EST, OST, mRNA)
ix) gene_interpolated_map_positions.WS174.gz - Interpolated map positions for each coding/RNA gene
x) clone_interpolated_map_positions.WS174.gz - Interpolated map positions for each clone
xi) best_blastp_hits.WS174.gz - for each C. elegans WormPep protein, lists Best blastp match to
human, fly, yeast, C. briggsae, and SwissProt & TrEMBL proteins.
xii) best_blastp_hits_brigprot.WS174.gz - for each C. briggsae protein, lists Best blastp match to
human, fly, yeast, C. elegans, and SwissProt & TrEMBL proteins.
xiii) geneIDs.WS174.gz - list of all current gene identifiers with CGC & molecular names (when known)
xiv) PCR_product2gene.WS174.gz - Mappings between PCR products and overlapping Genes
Release notes on the web:
-------------------------
http://www.wormbase.org/wiki/index.php/Release_notes
Genome sequence composition:
----------------------------
WS174 WS173 change
----------------------------------------------
a 32365889 32365889 +0
c 17779856 17779856 +0
g 17756016 17756016 +0
t 32365689 32365689 +0
n 0 0 +0
Total 100267450 100267450 +0
Chromosomal Changes:
--------------------
There are no changes to the chromosome sequences in this release.
Gene data set (Live C.elegans genes 24036)
------------------------------------------
Molecular_info 22345 (93%)
Concise_description 4524 (18.8%)
Reference 6981 (29%)
CGC_approved Gene name 9116 (37.9%)
RNAi_result 19859 (82.6%)
Microarray_results 19140 (79.6%)
SAGE_transcript 20044 (83.4%)
Wormpep data set:
----------------------------
There are 20101 CDS in autoace, 23258 when counting 3157 alternate splice forms.
The 23258 sequences contain 10,212,175 base pairs in total.
Modified entries 26
Deleted entries 8
New entries 6
Reappeared entries 2
Net change +0
Status of entries: Confidence level of prediction (based on the amount of transcript evidence)
-------------------------------------------------
Confirmed 7848 (33.7%) Every base of every exon has transcription evidence (mRNA, EST etc.)
Partially_confirmed 10802 (46.4%) Some, but not all exon bases are covered by transcript evidence
Predicted 4608 (19.8%) No transcriptional evidence at all
Status of entries: Protein Accessions
-------------------------------------
UniProtKB/Swiss-Prot accessions 3512 (15.1%)
UniProtKB/TrEMBL accessions 19384 (83.3%)
Status of entries: Protein_ID's in EMBL
---------------------------------------
Protein_id 22869 (98.3%)
Gene <-> CDS,Transcript,Pseudogene connections (cgc-approved)
---------------------------------------------
Entries with CGC-approved Gene name 7476
GeneModel correction progress WS173 -> WS174
-----------------------------------------
Confirmed introns not in a CDS gene model;
+---------+--------+
| Introns | Change |
+---------+--------+
Cambridge | 186 | 2 |
St Louis | 215 | 0 |
+---------+--------+
Members of known repeat families that overlap predicted exons;
+---------+--------+
| Repeats | Change |
+---------+--------+
Cambridge | 6 | 0 |
St Louis | 6 | 0 |
+---------+--------+
Synchronisation with GenBank / EMBL:
------------------------------------
No synchronisation issues
There are no gaps remaining in the genome sequence
---------------
For more info mail worm@sanger.ac.uk
-===================================================================================-
New Data:
---------
The following databases were updated for BLAST:
trembl release 35
swissprot release 52
yeast
Genome sequence updates:
-----------------------
None.
New Fixes:
----------
None.
Known Problems:
---------------
Other Changes:
--------------
Many Poly-A tails were masked in EST and mRNA sequences. New Poly-A
Site and Poly-A Signal sequence Features were defined based on the
alignment of these sequences to the genome:
- 3530 new (1931 site, 1599 signal sequence) Features were defined.
- 641 old Poly-A Features (490 site, 151 signal) with no supporting
Sequence evidence were removed (changed to Method="history").
Proposed Changes / Forthcoming Data:
-------------------------------------
We are working with the authors of this paper:
Ruby J et al. Cell. 2006 Dec 15;127(6):1193-207. "Large-scale
sequencing reveals 21U-RNAs and additional microRNAs and endogenous
siRNAs in C. elegans."
http://www.wormbase.org/db/misc/paper?name=WBPaper00028915;class=Paper
to refine and annotate circa 4500 new elegans RNA genes.
<A third class of nematode small RNAs, called 21U-RNAs, was
discovered. 21U-RNAs are precisely 21 nucleotides long, begin with a
uridine 5''-monophosphate but are diverse in their remaining 20
nucleotides, and appear modified at their 3''-terminal
ribose. 21U-RNAs originate from more than 5700 genomic loci dispersed
in two broad regions of chromosome IV-primarily between protein-coding
genes or within their introns. These loci share a large upstream motif
that enables accurate prediction of additional 21U-RNAs. The motif is
conserved in other nematodes, presumably because of its importance for
producing these diverse, autonomously expressed, small RNAs
(dasRNAs).>
Forthcoming model changes:
Added tags to ?Person and ?Paper to enable recording of negative
connections ie Mr X did NOT contribue to this paper.
Added Map_evidence to ?Transgene so that the paper that mapping data
is taken from can be attributed
Added a tags to ?Expr_pattern and ?Expression_cluster to handle
Localizome data Note: ?Interaction class update already committed
Model Changes:
------------------------------------
Added DB_info line to ?Gene
Replaced ?Y2H with a more generic ?YH class which contains Y2H and Y1H
data.
Added Anatomy_function class to allow the connection between
?Anatomy_term, ?Phenotype (proxy of biological function), and ?Gene
and still give some information about the experiment itself. Name
shall be "WBbtf0001"
-===================================================================================-
Quick installation guide for UNIX/Linux systems
-----------------------------------------------
1. Create a new directory to contain your copy of WormBase,
e.g. /users/yourname/wormbase
2. Unpack and untar all of the database.*.tar.gz files into
this directory. You will need approximately 2-3 Gb of disk space.
3. Obtain and install a suitable acedb binary for your system
(available from www.acedb.org).
4. Use the acedb 'xace' program to open your database, e.g.
type 'xace /users/yourname/wormbase' at the command prompt.
5. See the acedb website for more information about acedb and
using xace.