Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2009 Jul 16;10:318.
doi: 10.1186/1471-2164-10-318.

A Method for Identifying Alternative or Cryptic Donor Splice Sites Within Gene and mRNA Sequences. Comparisons Among Sequences From Vertebrates, Echinoderms and Other Groups

Affiliations
Free PMC article

A Method for Identifying Alternative or Cryptic Donor Splice Sites Within Gene and mRNA Sequences. Comparisons Among Sequences From Vertebrates, Echinoderms and Other Groups

Katherine M Buckley et al. BMC Genomics. .
Free PMC article

Abstract

Background: As the amount of genome sequencing data grows, so does the problem of computational gene identification, and in particular, the splicing signals that flank exon borders. Traditional methods for identifying splicing signals have been created and optimized using sequences from model organisms, mostly vertebrate and yeast species. However, as genome sequencing extends across the animal kingdom and includes various invertebrate species, the need for mechanisms to recognize splice signals in these organisms increases as well. With that aim in mind, we generated a model for identifying donor and acceptor splice sites that was optimized using sequences from the purple sea urchin, Strongylocentrotus purpuratus. This model was then used to assess the possibility of alternative or cryptic splicing within the highly variable immune response gene family known as 185/333.

Results: A donor splice site model was generated from S. purpuratus sequences that incorporates non-adjacent dependences among positions within the 9 nt splice signal and uses position weight matrices to determine the probability that the site is used for splicing. The Purpuratus model was shown to predict splice signals better than a similar model created from vertebrate sequences. Although the Purpuratus model was able to correctly predict the true splice sites within the 185/333 genes, no evidence for alternative or trans-gene splicing was observed.

Conclusion: The data presented herein describe the first published analyses of echinoderm splice sites and suggest that the previous methods of identifying splice signals that are based largely on vertebrate sequences may be insufficient. Furthermore, alternative or trans-gene splicing does not appear to be acting as a diversification mechanism in the 185/333 gene family.

Figures

Figure 1
Figure 1
Purpuratus donor splice site model. A. Analysis of the frequency of each base within the splice site reveals the S. purpuratus donor splice site consensus sequence. The nine nt window surrounding the donor splice sites from 292 annotated S. purpuratus gene models (2845 donor sequences) were extracted, and the frequency of each nt within the window was calculated. The values shown in bold are the consensus nucleotides. Positions 1 and 2 are invariant because only canonical splice sites were used in this analysis. B. The Purpuratus splice site model incorporated non-adjacent dependences among the bases within the splice site. The model is implemented such that a splice site score of a given candidate sequence is computed using the matrix determined by applying the set of rules shown in the flowchart. For example, the sequence AAGGTAAGT would be scored using the matrix A-2G5G-1A4T6 (A-2→A-2G5→A-2G5G-1→ A-2G5G-1A4→A-2G5G-1A4T6). Non-adjacent dependences were calculated for the 2845 S. purpuratus donor splice sites for each of the seven variable positions between the consensus nt and the non-consensus nucleotides in the other six positions (Table 1). The position with the maximum dependencies was used to serially subdivide the sites until either the subdivision became too small to obtain reliable data, or no more significant dependences were observed. Position frequency matrices are shown, which were calculated for each of the terminal subdivisions and ultimately used in the Purpuratus splice site model.
Figure 2
Figure 2
Analysis of known positive and negative splice sites using the Purpuratus and Vertebrate splice site models. Histograms of the scores given to known positive (solid lines) and negative (dashed lines) splice sites were generated (bin size = 2) for the Purpuratus (A) and Vertebrate (B) splice site models by analyzing the genes used to generate the models (Additional file 2, 3, and 4; [28]). For example, 22% of the known positive sites received scores between 4 and 6. The average of the means (Table 3) is shown by a vertical dotted line. The gray region corresponds to N0.95, and P0.05 (Table 3), which flank the left and right side of the gray region, respectively, and are shown as dashed/dotted lines. The ✳ located on the 0.25% line indicate the mean of the positive and negative scores.
Figure 3
Figure 3
Histograms to evaluate the models. Genes isolated from S. purpuratus (circles), vertebrates (diamonds), and protostomes (triangles) were collected and analyzed using the Purpuratus (A) and Vertebrate (B) models. Histograms of the known positive (solid lines) and negative (dashed lines) donor splice sites were generated (bin size = 2). The average of the means (Table 3) is shown by a vertical dotted line. Values corresponding to N0.95, and P0.05 (Table 3) flank the left and right side of the gray region, respectively, and are shown as a dashed/dotted line. The tables within the graphs indicate the percentage of known positive (Pos.) and negative (Neg.) S. purpuratus (Purp.), vertebrate (Vert.), and protostome (Prot.) sequences, which were classified as positive or negative using the average of the means as the threshold.

Similar articles

See all similar articles

Cited by 2 articles

References

    1. Berget SM. Exon recognition in vertebrate splicing. J Biol Chem. 1995;270:2411–2414. - PubMed
    1. Mathe C, Sagot MF, Schiex T, Rouze P. Current methods of gene prediction, their strengths and weaknesses. Nucleic Acids Res. 2002;30:4103–4117. doi: 10.1093/nar/gkf543. - DOI - PMC - PubMed
    1. Zhang MQ. Computational prediction of eukaryotic protein-coding genes. Nat Rev Genet. 2002;3:698–709. doi: 10.1038/nrg890. - DOI - PubMed
    1. Consortium IHGS, Lander ES, Linton LM, Birren B, Nusbaum C, Zody MC, Baldwin J, Devon K, Dewar K, Doyle M, et al. Initial sequencing and analysis of the human genome. Nature. 2001;409:860–921. doi: 10.1038/35057062. - DOI - PubMed
    1. Burset M, Guigo R. Evaluation of gene structure prediction programs. Genomics. 1996;34:353–367. doi: 10.1006/geno.1996.0298. - DOI - PubMed

Publication types

LinkOut - more resources

Feedback