An evolutionary model for protein-coding regions with conserved RNA structure

Mol Biol Evol. 2004 Oct;21(10):1913-22. doi: 10.1093/molbev/msh199. Epub 2004 Jun 30.


Here we present a model of nucleotide substitution in protein-coding regions that also encode the formation of conserved RNA structures. In such regions, apparent evolutionary context dependencies exist, both between nucleotides occupying the same codon and between nucleotides forming a base pair in the RNA structure. The overlap of these fundamental dependencies is sufficient to cause "contagious" context dependencies which cascade across many nucleotide sites. Such large-scale dependencies challenge the use of traditional phylogenetic models in evolutionary inference because they explicitly assume evolutionary independence between short nucleotide tuples. In our model we address this by replacing context dependencies within codons by annotation-specific heterogeneity in the substitution process. Through a general procedure, we fragment the alignment into sets of short nucleotide tuples based on both the protein coding and the structural annotation. These individual tuples are assumed to evolve independently, and the different tuple sets are assigned different annotation-specific substitution models shared between their members. This allows us to build a composite model of the substitution process from components of traditional phylogenetic models. We applied this to a data set of full-genome sequences from the hepatitis C virus where five RNA structures are mapped within the coding region. This allowed us to partition the effects of selection on different structural elements and to test various hypotheses concerning the relation of these effects. Of particular interest, we found evidence of a functional role of loop and bulge regions, as these were shown to evolve according to a different and more constrained selective regime than the nonpairing regions outside the RNA structures. Other potential applications of the model include comparative RNA structure prediction in coding regions and RNA virus phylogenetics.

Publication types

  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, P.H.S.

MeSH terms

  • Base Sequence
  • Conserved Sequence*
  • Evolution, Molecular*
  • Models, Genetic
  • Nucleic Acid Conformation
  • Phylogeny
  • Proteins / genetics*
  • RNA / genetics*
  • Sequence Alignment
  • Sequence Analysis, RNA*


  • Proteins
  • RNA