From WormBaseWiki
New release of WormBase WS181, Wormpep181 and Wormrna181 Fri Sep 7 11:40:20 BST 2007
WS181 was built by Gary Williams
======================================================================
This directory includes:
i) database.WS181.*.tar.gz - compressed data for new release
ii) models.wrm.WS181 - the latest database schema (also in above database files)
iii) CHROMOSOMES/subdir - contains 3 files (DNA, GFF & AGP per chromosome)
iv) WS181-WS180.dbcomp - log file reporting difference from last release
v) wormpep181.tar.gz - full Wormpep distribution corresponding to WS181
vi) wormrna181.tar.gz - latest WormRNA release containing non-coding RNA's in the genome
vii) confirmed_genes.WS181.gz - DNA sequences of all genes confirmed by EST &/or cDNA
viii) cDNA2orf.WS181.gz - Latest set of ORF connections to each cDNA (EST, OST, mRNA)
ix) gene_interpolated_map_positions.WS181.gz - Interpolated map positions for each coding/RNA gene
x) clone_interpolated_map_positions.WS181.gz - Interpolated map positions for each clone
xi) best_blastp_hits.WS181.gz - for each C. elegans WormPep protein, lists Best blastp match to
human, fly, yeast, C. briggsae, and SwissProt & TrEMBL proteins.
xii) best_blastp_hits_brigprot.WS181.gz - for each C. briggsae protein, lists Best blastp match to
human, fly, yeast, C. elegans, and SwissProt & TrEMBL proteins.
xiii) geneIDs.WS181.gz - list of all current gene identifiers with CGC & molecular names (when known)
xiv) PCR_product2gene.WS181.gz - Mappings between PCR products and overlapping Genes
Release notes on the web:
-------------------------
http://www.wormbase.org/wiki/index.php/Release_notes
Genome sequence composition:
----------------------------
WS181 WS180 change
----------------------------------------------
a 32365949 32365949 +0
c 17779887 17779887 +0
g 17756036 17756036 +0
t 32365750 32365750 +0
n 0 0 +0
- 0 0 +0
Total 100267622 100267622 +0
Chromosomal Changes:
--------------------
There are no changes to the chromosome sequences in this release.
Gene data set (Live C.elegans genes 29501)
------------------------------------------
Molecular_info 27790 (94.2%)
Concise_description 4889 (16.6%)
Reference 7211 (24.4%)
CGC_approved Gene name 14732 (49.9%)
RNAi_result 20690 (70.1%)
Microarray_results 19945 (67.6%)
SAGE_transcript 18645 (63.2%)
Wormpep data set:
----------------------------
There are 20144 CDS in autoace, 23518 when counting 3374 alternate splice forms.
The 23518 sequences contain base pairs in total.
Modified entries 23
Deleted entries 0
New entries 11
Reappeared entries 0
Net change +11
Status of entries: Confidence level of prediction (based on the amount of transcript evidence)
-------------------------------------------------
Confirmed 8109 (34.5%) Every base of every exon has transcription evidence (mRNA, EST etc.)
Partially_confirmed 10746 (45.7%) Some, but not all exon bases are covered by transcript evidence
Predicted 4663 (19.8%) No transcriptional evidence at all
Status of entries: Protein Accessions
-------------------------------------
UniProtKB/Swiss-Prot accessions 3439 (14.6%)
UniProtKB/TrEMBL accessions 18829 (80.1%)
Status of entries: Protein_ID's in EMBL
---------------------------------------
Protein_id 22278 (94.7%)
Gene <-> CDS,Transcript,Pseudogene connections (cgc-approved)
---------------------------------------------
Entries with CGC-approved Gene name 13107
GeneModel correction progress WS180 -> WS181
-----------------------------------------
Confirmed introns not in a CDS gene model;
+---------+--------+
| Introns | Change |
+---------+--------+
Cambridge | 24 | 1 |
St Louis | 157 | -1 |
+---------+--------+
Members of known repeat families that overlap predicted exons;
+---------+--------+
| Repeats | Change |
+---------+--------+
Cambridge | 6 | 0 |
St Louis | 6 | 0 |
+---------+--------+
Synchronisation with GenBank / EMBL:
------------------------------------
No synchronisation issues
There are no gaps remaining in the genome sequence
---------------
For more info mail worm@sanger.ac.uk
-===================================================================================-
New Data:
---------
The following BLAST databases were updated to the latest version:
gadfly
ipi_human
yeast
Genome sequence updates:
-----------------------
New Fixes:
----------
Known Problems:
---------------
Other Changes:
--------------
Analysis objects are now being used for evidence of orthologs.
Proposed Changes / Forthcoming Data:
-------------------------------------
2944 5' and 3' RACE sequences from the Vidal lab will be aligned to
the C.elegans genome giving an improved view of the ends of many
genes.
Proposed Model Changes
----------------------
Added History tracking tags to ?Feature class
Removed WashU_ID and Exelixis_ID as Variation name types
Added Amber_UAG_or_Opal_UGA as final ambiguous mutation
Model Changes:
------------------------------------
#####################################################################
#Molecular_change Nonsense UNIQUE Amber_UAG Text #Evidence
Ochre_UAA Text #Evidence
Opal_UGA Text #Evidence
Ochre_UAA_or_Opal_UGA Text #Evidence
Amber_UAG_or_Ochre_UAA Text #Evidence
######################################################################
?Transcript
Properties Transcript mRNA
miRNA
ncRNA
rRNA
scRNA
snRNA
snlRNA *new Small Nuclear Like RNA
snoRNA
stRNA
tRNA
u21RNA
#################################################################################################
#Splice_confirmation cDNA ?Sequence // ?Sequence link to flag which cDNA is confirming
EST ?Sequence // or falsifying the intron in question, added [031121 krb]
OST
mRNA
Homology
UTR ?Sequence
False ?Sequence
Inconsistent ?Sequence
#################################################################################################
// the main Analysis class To hold information about Publications / Persons / other evidence and the used WormBase Release
?Analysis Source_database ?Database
Based_on_WB_Release Int
Based_on_DB_Release Text
Description ?Text
Reference ?Paper XREF Describes_analysis
Conducted_by ?Person XREF Describes_analysis // not always the same as the author of the paper - eg Erich running OrthoMCL
URL Text // eg www.treefam.org (or would this be covered by Source_database?)
// changes to the Evidence
#Evidence From_analysis ?Analysis
// Paper class changes
?Paper Describes_analysis ?Analysis XREF Reference
// Person class changes
?Person Conducted ?Analysis XREF Conducted_by
-===================================================================================-
Quick installation guide for UNIX/Linux systems
-----------------------------------------------
1. Create a new directory to contain your copy of WormBase,
e.g. /users/yourname/wormbase
2. Unpack and untar all of the database.*.tar.gz files into
this directory. You will need approximately 2-3 Gb of disk space.
3. Obtain and install a suitable acedb binary for your system
(available from www.acedb.org).
4. Use the acedb 'xace' program to open your database, e.g.
type 'xace /users/yourname/wormbase' at the command prompt.
5. See the acedb website for more information about acedb and
using xace.
____________ END _____________