How do I get AQL to search data in hashes?

Like this example:

select p->Standard_name, a[Institution], a[Email] from p in class Person, a in p->Address[0] where exists p->Supervised

More documentation is available at [http://www.acedb.org/Software/whelp/AQL/examples_worm.shtml](http://www.acedb.org/Software/whelp/AQL/examples_worm.shtml); scroll down to "Queries on objects containing hash structures."

How can I obtain all the abstracts on Wormbase and the particular genes that they are associated with?

There are two ways:

1) go to ftp://ftp.sanger.ac.uk/pub/wormbase/current_release/ and get the acedb data files.

2) Use AcePerl to get the abstracts. You can do this easily with an Aceperl script:

my $db = Ace->connect(-host=>yourhost.com) || warn 'yikes'; my $iterator = $db->fetch_many(-query=>qq(find Paper where Abstract)); while (my $obj = $iterator->next) { # grab info from the object my @genes = $obj->Gene; ... etc ... print join(' ',@genes); }

How can I find out how many genes contain expression patterns generated with a specific method?

(For example, by in situ hybridization?)

Type the following command under the menu and in the box of "Advanced Search". The following line search for Expr_patterns containig all three types of methods. If the '&' is replaced by '|', the command will search for Expr_pattern with In_situ OR Antibody OR Reporter_gene data.

find Expr_pattern Type="In_situ" & Type="Antibody" & Type="Reporter_gene"

You may change the words following the same syntax to search for other objects.

How can I download the alignments of EST sequences to genomic sequences?

You can extract it from the GFF files that we provide with every release of WormBase. For more information on GFF files see:

http://www.sanger.ac.uk/Software/formats/GFF/

Basically, we release one GFF file per chromosome and this contains the coordinates and details of most features that we can map onto chromosome base pair coordinates.

These files are accesible from the main WormBase page (see the Feature table links on the right) and should also be on the WormBase and Sanger Institute FTP sites.

You will need to extract only a subset of these files, i.e. lines that match the pattern 'BLAT_EST_'. This is very easy to do if you have access to a UNIX/Linux system (use the 'grep' command).

E.g. here are two sample lines from the Chromosome II file (these will probably wrap around your screen):

CHROMOSOME_II BLAT_EST_BEST similarity 5754433 5755008 100
. . Target "Sequence:yk776e12.5" 21 596
CHROMOSOME_II BLAT_EST_OTHER similarity 5755968 5755971 98.4
. . Target "Sequence:yk4g4.5" 116 119


Within these lines are details of the chromosome coordinates, the BLAT score, the matching sequence name, and the coordinates within the matching sequence.

How can I retrieve timestamps of Acedb from the command line, and do I have to use Perl?

We use AcePerl to retrieve some timestamp information...this is done via an AQL query.

E.g. if you wanted to find the timestamp of a tag in a particular object, belonging to a particular class, you could do:

my $aql_query = "select s,s->$tag.node_session from s in object(\"$class\",\"$object\")";
my @aql = $db->aql($aql_query);
my $timestamp = $aql[0]->[1];

How can I search for pseudogenes in Wormbase?

It will take a long time if you do AQL queries. However, a different way of query can be done if you want to retrieve the info from wormbase website.

From More search -> Advanced search at http://www.wormbase.org/db/searches/query

In Query Acedb, type in

find sequence *; pseudogene


You should get a result of pseudogene objects within a couple of seconds.

Where can I find a list of classes and subclasses for Acedb?

You can find a list of Acedb classes by first clicking on the More Searches link on the upper right corner of the WormBase home page. From here, select the WormBase Class Browser, which will bring you to a searchable drop-down menu of all the Acedb classes.

For performing queries, it is helpful to know the data model for each of the classes that you would like to search. The data models can be accessed from this same page by typing the class of interest into the search box and then selecting "Model" from the drop-down menu. This will lead you to a Tree Display that diagrams how data for a particular class is represented in Acedb.

Also, from the MoreSearches link, you can access the Advanced AQL Search, which has further documentation and examples for querying the database.

How can I retrieve the gi numbers only for a list of entries having the GO term selected?

At present, you can retrieve Genbank identifiers (i.e. AAMxxxxx, AAKxxxxx, AAFxxxxx, etc.) for CDS's that are associated with a particular GO term by performing an AQL query. Here are the steps:

1) At the top right corner of the WormBase homepage, click on the More Searches link.

2) Under the general heading, select the Advanced AQL Search link.

3) Type the following query into the box: select a, b, c[1] from a in class CDS, b in a->go_term, c in a->protein_id where b = "GO:0003700"

4) You should get back a three-column table listing each CDS, the GO term you selected, and a Genbank ID.

If you are interested, the rationale for the query can be bettter understood by looking at our data model for CDS's, which is at http://www.wormbase.org/db/misc/etree?name=%3FCDS&class=Model;expand=Visible#Visible. The above query searches in the CDS class, in the attribute go_term, where we have defined the go_term to be "GO:0003700", and in the attribute protein_id for the unique text id which is the database identifier. The [1] after the letter c in the query indicates that the search will retrieve information in the 1, or text, column of protein_id, since the sequence column is considered column 0.

How can I download C. briggsae 3' UTRs in bulk?

We don't really have a strictly empirical set of 3' UTRs (3' flanking sequences taking from cDNA). However, what you probably really want are predicted 3' UTR regions. Those, you can get by going to the WormBase ParaSite BioMart tool:

http://parasite.wormbase.org/biomart/martview/

Navigate to the URL above and, once loaded, click on "Query Filters" and specify the species as Caenorhabditis briggsae and specify your list of C. briggsae genes you'd like 3'UTRs for. Once you've entered your list of C. briggsae genes, click on "Output Attributes", then on "Retrieve sequences", and then open the "SEQUENCES" section and specify "3'UTR". Finally, click on "Results" at the top of the page to get the list of 3'UTR sequences.

How can I find the coding sequences of alleles for particular gene(s) having SNPs from C. elegans?

For instance, if you want to find out SNP sequences for H39E23.1a gene, you can use the following AQL query: select a->predicted_gene, a, a->flanking_sequences[1],

a->flanking_sequences[2], a->substitution from a in class allele where
a->predicted_gene = "H39E23.1a" and a->method = "snp"


The output of the query (in text mode) looks like this:

H39E23.1a snp_AH10.2 tgaaaaaaactaatttttaatgtga tcttggccacaattgacctagtttg [A/G]
H39E23.1a snp_AH10.3 ctgaacaactgaaaaaggaaagaaa agggaaaaagttcgaccacaaaaaa [G/A]


Here the first column is the gene name, second is the allele name, third and fourth are sequences flanking the allele and the last one is the actual allele sequence change. You can modify the query to retrieve information for genes that you're interested in.

How can I download the spliced and non-spliced regions for all C. elegans or C. briggsae genes?

You can download spliced/unspliced sequences for a list of genes using Batch Gene tool: http://www.wormbase.org/db/searches/info_dump You can paste a list of genes you're interested in into the search box and select Spliced and Unspliced check boxes in the Sequence field. If you output data as text, you'll be able to save it to your harg disk.

To get the list of C. briggsae genes (so that you can paste it into the search box), you can use the following query: select a from a in class cds where a->species like "*briggsae*".

How do I find all genes with transmembrane or signalp domains?

Go to the WB Query page and enter this query.

'find Wormpep where Feature AND NEXT = "signalp"; follow Corresponding_CDS; follow Gene'

substitute "tmhmm" for "signalp" to get genes with transmembrane domains.

Alternatively, all gene with transmembrane domains are automatically assigned the GO term GO:0016021


Last edited by Chris Grove – 44 days ago