An evolutionary model for protein-coding regions with conserved RNA structure

Jakob Skou Pedersen; Roald Forsberg; Irmtraud Margret Meyer; Jotun Hein

doi:10.1093/molbev/msh199

An evolutionary model for protein-coding regions with conserved RNA structure

Mol Biol Evol. 2004 Oct;21(10):1913-22. doi: 10.1093/molbev/msh199. Epub 2004 Jun 30.

Authors

Jakob Skou Pedersen¹, Roald Forsberg, Irmtraud Margret Meyer, Jotun Hein

Affiliation

¹ Bioinformatics Research Center, University of Aarhus, Aarhus, Denmark.

PMID: 15229291
DOI: 10.1093/molbev/msh199

Abstract

Here we present a model of nucleotide substitution in protein-coding regions that also encode the formation of conserved RNA structures. In such regions, apparent evolutionary context dependencies exist, both between nucleotides occupying the same codon and between nucleotides forming a base pair in the RNA structure. The overlap of these fundamental dependencies is sufficient to cause "contagious" context dependencies which cascade across many nucleotide sites. Such large-scale dependencies challenge the use of traditional phylogenetic models in evolutionary inference because they explicitly assume evolutionary independence between short nucleotide tuples. In our model we address this by replacing context dependencies within codons by annotation-specific heterogeneity in the substitution process. Through a general procedure, we fragment the alignment into sets of short nucleotide tuples based on both the protein coding and the structural annotation. These individual tuples are assumed to evolve independently, and the different tuple sets are assigned different annotation-specific substitution models shared between their members. This allows us to build a composite model of the substitution process from components of traditional phylogenetic models. We applied this to a data set of full-genome sequences from the hepatitis C virus where five RNA structures are mapped within the coding region. This allowed us to partition the effects of selection on different structural elements and to test various hypotheses concerning the relation of these effects. Of particular interest, we found evidence of a functional role of loop and bulge regions, as these were shown to evolve according to a different and more constrained selective regime than the nonpairing regions outside the RNA structures. Other potential applications of the model include comparative RNA structure prediction in coding regions and RNA virus phylogenetics.

Publication types

Research Support, Non-U.S. Gov't
Research Support, U.S. Gov't, P.H.S.

MeSH terms

Base Sequence
Conserved Sequence*
Evolution, Molecular*
Models, Genetic
Nucleic Acid Conformation
Phylogeny
Proteins / genetics*
RNA / genetics*
Sequence Alignment
Sequence Analysis, RNA*

Substances

Proteins
RNA

Grants and funding

1-R01-GM60729-01/GM/NIGMS NIH HHS/United States