We have been using the 55 kb
unc-22 sequence (G. Benian et al, Nature 342 (1989) 45-50, and this WBG) to test the gm automated DNA sequence analysis program (WBG Jan. 1990). One of the initial test runs on the complete sequence predicted ten short exons between positions 16700 and 22000, which together encoded an amino-acid sequence containing twitchin-like repeats. Five of these exons were shown to be true
unc-22 exons by PCR sequencing of
unc-22 cDNA. At analysis stringencies similar to those that give good results for the myosin sequences, gm predicts numerous minor splicing alternatives for
unc-22. A typical 'best' prediction, run in 17 minutes on a workstation, predicts 15 of the 20
unc-22 exons located 3' of position 15000 correctly, predicts one of the two boundaries of another 2 exons correctly, and predicts 6 spurious exons. A spinoff of these tests has been the prediction of several new genes in the
unc-22 sequence. two of which have now been confirmed by cDNA analysis. The extents and orientations of three genes predicted in the
unc-22 sequence are shown in Fig. 1. [See Figure 1] Only the 3' end of the predicted female-specific ('fem-sp') gene is contained within the
unc-22 sequence. The amino acid sequence of this fragment is highly similar to that of the mouse interleukin-1 precursor (PIR ICMS1); however, more of the sequence is needed to see whether this is significant. A partial cDNA overlapping this prediction has been isolated. The predicted 'serine-rich' gene is embedded entirely within the 7.4 kb intron of
unc-22. By using selected fragments from the first 24 kb of
unc-22 as probes against Northerns, at least two male-specific messages have been detected; probably one of these messages in encoded by
spe-17 (more in the next WBG). Neither of these messages, however, seems to be from the predicted serine-rich gene. A number of ORFs over 100 bases in length contained within the sequence have no known function; one or more of these may encode male-specific transcripts. The predicted 'transporter' gene is the best characterized of the predictions shown in Fig. 1. A partial cDNA overlapping this prediction has been isolated. An amino-acid sequence derived from the cDNA data and the gm run is shown in Fig. 2, together with an alignment with a mammalian glucose transporter protein generated by fasta (W. Pearson and D. Lipmann, PNAS 85 (1987) 2444-2448). The predicted protein is also significantly similar to other mammalian glucose and ion transporters. [See Figure 2] The predicted protein has four highly hydrophobic regions, and is expected from the Garnier rules to be composed primarily of alpha helix, suggesting that it is an integral membrane protein. We suspect that this gene encodes a glucose transporter, or a closely related protein. D. Baillie has located at least three other potential integral membrane protein genes between
unc-22 and
dpy-20 (S. Prasad and D. Baillie, Genomics 5 (1989)185-198), one of which appears to be the Na+/H+ antiporter gene (this WBG).