Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
, 7, 11

Origin and Evolution of Spliceosomal Introns

Affiliations
Review

Origin and Evolution of Spliceosomal Introns

Igor B Rogozin et al. Biol Direct.

Abstract

Evolution of exon-intron structure of eukaryotic genes has been a matter of long-standing, intensive debate. The introns-early concept, later rebranded 'introns first' held that protein-coding genes were interrupted by numerous introns even at the earliest stages of life's evolution and that introns played a major role in the origin of proteins by facilitating recombination of sequences coding for small protein/peptide modules. The introns-late concept held that introns emerged only in eukaryotes and new introns have been accumulating continuously throughout eukaryotic evolution. Analysis of orthologous genes from completely sequenced eukaryotic genomes revealed numerous shared intron positions in orthologous genes from animals and plants and even between animals, plants and protists, suggesting that many ancestral introns have persisted since the last eukaryotic common ancestor (LECA). Reconstructions of intron gain and loss using the growing collection of genomes of diverse eukaryotes and increasingly advanced probabilistic models convincingly show that the LECA and the ancestors of each eukaryotic supergroup had intron-rich genes, with intron densities comparable to those in the most intron-rich modern genomes such as those of vertebrates. The subsequent evolution in most lineages of eukaryotes involved primarily loss of introns, with only a few episodes of substantial intron gain that might have accompanied major evolutionary innovations such as the origin of metazoa. The original invasion of self-splicing Group II introns, presumably originating from the mitochondrial endosymbiont, into the genome of the emerging eukaryote might have been a key factor of eukaryogenesis that in particular triggered the origin of endomembranes and the nucleus. Conversely, splicing errors gave rise to alternative splicing, a major contribution to the biological complexity of multicellular eukaryotes. There is no indication that any prokaryote has ever possessed a spliceosome or introns in protein-coding genes, other than relatively rare mobile self-splicing introns. Thus, the introns-first scenario is not supported by any evidence but exon-intron structure of protein-coding genes appears to have evolved concomitantly with the eukaryotic cell, and introns were a major factor of evolution throughout the history of eukaryotes.

Figures

Figure 1
Figure 1
Consensus motifs for donor and acceptor splicing signals. The Y axis indicates the strength of splicing signals (base composition bias based on information content). The data is from [19].
Figure 2
Figure 2
Intron density and intron length in 100 eukaryotes. The data is from [53].
Figure 3
Figure 3
An example of a recent intron acquisition in a retrotransposon-derived gene: structure of two splice variants of RNF113B. The new intron of RNF113B is not a de novo insertion but rather a derivative of exonic sequences (this intron contains 59 nucleotides from the former coding sequence and 46 nucleotides from the 3’ UTR). A partial alignment of three RNF113B sequences and three RNF113A sequences is shown above the spliced RNF113B isoform. The donor splice site is marked in yellow, the predicted branch point signal is marked in blue, and the acceptor splice site is marked in gray. The data is from [101].
Figure 4
Figure 4
The Xist gene evolved from a protein-coding gene and a set of transposable elements. Blue boxes indicate exons originating from Lnx3; red boxes indicate exons originating from transposable elements; dashed boxes indicate remnants of protein-coding exons. The data is from [105].
Figure 5
Figure 5
Correlation between the strength of the branch point signal and the strength of the acceptor splice site. The linear correlation coefficient is R = -0.52 (P = 0.000025) after exclusion of the obvious outlier Aureococcus anophagefferens[117]. The information content of splicing signals in 61 eukaryotic species is from [117]. Species names: B. taurus, C. familiaris, E. caballus, H. sapiens, M. domestica, M. musculus, O. anatinus, R. norvegicus, S. scrofa, B. florida, C. intestinalis, C. savignyi, D. rerio, G. gallus, O. latipes, P. marinus, T. guttata, X. tropicalis, A. gambiae, A. mellifera, C. elegans, D. pulex, D. melanogaster, H. magnipapillata, L. gigantea, M. brevicollis, N. vectensis, S. purpuratus, T. castaneum, B. dendrobatidis, C. heterostrophus, C. neoformans, M. grisea, N. haematococca, P. chrysosporium, P. blakesleeanus, P. infestans, P. placenta, S. cerevisiae, S. commune, T. virens, A. anophagefferens, D. discoideum, D. purpureum, N. gruberi, O. lucimarinus, P. tricornutum, T. pseudonana, T. adhaerens, A. thaliana, Chlorella NC64A, C. reinhardtii, M. pusilla, Micromonas RCC299, O. sativa, P. patens, P. trichocarpa, S. moellendorffii, S. bicolor, V. vinifera, V. carteri.
Figure 6
Figure 6
Fractions of protosplice sites and actual introns in the three phases. Species abbreviations: (At) green plant Arabidopsis thaliana, (Hs) human Homo sapiens. An excess of protosplice sites in phase 0 is noticeable, however the ‘protosplice site’ hypothesis, which posits that introns are randomly inserted into protosplice sites, is unable to fully explain the observed over-representation of phase 0 introns. The data is from [125,132].
Figure 7
Figure 7
Reconstruction of intron gains and losses in the evolution of eukaryotes and intron density in ancestral eukaryote forms. The data is from [53]. Branch widths are proportional to intron density which is shown next to terminal taxa and some deep ancestors, in units of the introns count per 1 kbp coding sequence. Human (Hsap) is marked by a blue dot. Horizontal bars show ancestral (top) and current (bottom) intron content; gain and loss (in the lineage from the respective ancestor) are shown by red and green, respectively. The bars are aligned so that the pale yellow part shows the retained introns from the ancestor. Species names and abbreviations: Aureococcus anophagefferens (Aano), Aedes aegypti (Aaeg), Agaricusbisporus (Abis), Anopheles gambiae (Agam), Allomyces macrogynus (Amac), Apis mellifera (Amel), Aspergillus nidulans (Anid), Acyrthosiphon pisum (Apis), Arabidopsis thaliana (Atha), Babesia bovis (Bbov), Batrachochytrium dendrobatidis (Bden), Branchiostoma floridae (Bflo), Botryotinia fuckeliana (Bfuc), Brugia malayi (Bmal), Bombyx mori (Bmor), Coccomyxa sp. C-169 (C169), Chlorella sp. NC64a (C64a), Caenorhabditis briggsae (Cbri), Caenorhabditis elegans (Cele), Coprinopsis cinerea okayama (Ccin), Cochliobolus heterostrophus C5 (Chet), Coccidioides immitis (Cimm), Ciona intestinalis (Cint), Cryptococcus neoformans var. neoformans (Cneo), Chlamydomonas reinhardtii (Crei), Capitella teleta (Ctel), Capsaspora owczarzaki (Cowc), Dictyostelium discoideum (Ddis), Dictyostelium purpureum (Dpur), Drosophila melanogaster (Dmel), Drosophila mojavenis (Dmoj), Daphnia pulex (Dpul), Danio rerio (Drer), Entamoeba dispar (Edis), Entamoeba histolytica (Ehis), Emiliania huxleyi (Ehux), Fragilariopsis cylindrus (Fcyl), Phanerochaete chrysosporium (Fchr), Phaeodactylum tricornutum (Ftri), Gallus gallus (Ggal), Gibberella zeae (Gzea), Hydra magnipapillata (Hmag), Helobdella robusta (Hrob), Homo sapiens (Hsap), Ixodes scapularis (Isca), Laccaria bicolor (Lbic), Lottia gigantea (Lgig), Micromonas sp. RCC299 (M299), Monosiga brevicollis (Mbre), Mucor circinelloides (Mcir), Mycosphaerella fijiensis (Mfij), Mycosphaerella graminicola (Mgra), Magnaporthe grisea (Mgri), Melampsora laricis-populina (Mlar), Micromonas pusilla (Mpus), Neurospora crassa (Ncra), Nematostella vectensis (Nvec), Nasonia vitripennis (Nvit), Ostreococcus sp. RCC809 (O809), Ostreococcus lucimarinus (Oluc), Oryza sativa japonica (Osat), Ostreococcus taurii (Otau), Phytophthora capsici (Pcap), Plasmodium falciparum (Pfal), Puccinia graminis (Pgra), Pediculus humanus (Phum), Phaeosphaeria nodorum (Pnod), Physcomitrella patens subsp. patens (Ppat), Phytophthora ramorum (Pram), Pyrenophora tritici-repentis (Prep), Proterospongia sp. (Prsp), Phytophthora sojae (Psoj), Paramecium tetraurelia (Ptet), Plasmodium vivax (Pviv), Plasmodium yoelii yoelii (Pyoe), Rhizopus oryzae (Rory), Sorghum bicolor (Sbic), Saccharomyces cerevisiae (Scer), Schizosaccharomyces japonicus (Sjap), Schistosoma mansoni (Sman), Selaginella moellendorffii (Smoe), Schizosaccharomyces pombe (Spom), Spizellomyces punctatus (Spun), Strongylocentrotus purpuratus (Spur), Sporobolomyces roseus (Sros), Sclerotinia sclerotiorum (Sscl), Trichoplax adhaerens (Tadh), Theileria annulata (Tann), Tribolium castaneum (Tcas), Toxoplasma gondii (Tgon), Taenopygia guttata (Tgut), Theileria parvum (Tpar), Thalassiosira pseudonana (Tpse), Tetrahymena thermophila (Tthe), Ustilago maydis (Umay), Uncinocarpus reesii (Uree), Volvox carteri (Vcar), Vitis vinifera (Vvin).
Figure 8
Figure 8
Conservation of intron positions in ancient and recent eukaryotic paralogs. Conservation of introns was assessed by analysis of multiple alignments of paralogous sequences from 6 species (H. sapiens, C. elegans, D. melanogaster, S. pombe, S. cerevisiae, A. thaliana). An intron position was considered to be conserved if it was shared by any pair of paralogs [148].
Figure 9
Figure 9
A hypothetical scenario of early history of spliceosomal introns. The scheme shows the inferred sequence of events from putative ancestors of eukaryotes to the origin of spliceosomal introns from group II introns invading the host genome upon mitochondrial endosymbiosis [46].
Figure 10
Figure 10
Total intron length as a function of expression level category. Intron length is measured in nucleotides. Expression levels are binned into 30 categories, with higher categories matching higher expression levels, as described previously [167]. Each point is the mean value for all genes in the given expression category, and the error bar indicates the standard deviation of the mean.

Similar articles

See all similar articles

Cited by 110 PubMed Central articles

See all "Cited by" articles

References

    1. Gilbert W. Why genes in pieces? Nature. 1978;271(5645):501. doi: 10.1038/271501a0. - DOI - PubMed
    1. Jurica MS, Moore MJ. Pre-mRNA splicing: awash in a sea of proteins. Mol Cell. 2003;12(1):5–14. doi: 10.1016/S1097-2765(03)00270-3. - DOI - PubMed
    1. Nilsen TW. The spliceosome: the most complex macromolecular machine in the cell? Bioessays. 2003;25(12):1147–1149. doi: 10.1002/bies.10394. - DOI - PubMed
    1. Nixon JE, Wang A, Morrison HG, McArthur AG, Sogin ML, Loftus BJ, Samuelson J. A spliceosomal intron in Giardia lamblia. Proc Natl Acad Sci U S A. 2002;99(6):3701–3705. doi: 10.1073/pnas.042700299. - DOI - PMC - PubMed
    1. Simpson AG, MacQuarrie EK, Roger AJ. Eukaryotic evolution: early origin of canonical introns. Nature. 2002;419(6904):270. doi: 10.1038/419270a. - DOI - PubMed

Publication types

LinkOut - more resources

Feedback