RevTrans: Multiple alignment of coding DNA from aligned amino acid sequences

Rasmus Wernersson; Anders Gorm Pedersen

doi:10.1093/nar/gkg609

RevTrans: Multiple alignment of coding DNA from aligned amino acid sequences

Nucleic Acids Res. 2003 Jul 1;31(13):3537-9. doi: 10.1093/nar/gkg609.

Authors

Rasmus Wernersson¹, Anders Gorm Pedersen

Affiliation

¹ Center for Biological Sequence Analysis, BioCentrum-DTU, Technical University of Denmark, Building 208, DK-2800, Lyngby, Denmark.

Abstract

The simple fact that proteins are built from 20 amino acids while DNA only contains four different bases, means that the 'signal-to-noise ratio' in protein sequence alignments is much better than in alignments of DNA. Besides this information-theoretical advantage, protein alignments also benefit from the information that is implicit in empirical substitution matrices such as BLOSUM-62. Taken together with the generally higher rate of synonymous mutations over non-synonymous ones, this means that the phylogenetic signal disappears much more rapidly from DNA sequences than from the encoded proteins. It is therefore preferable to align coding DNA at the amino acid level and it is for this purpose we have constructed the program RevTrans. RevTrans constructs a multiple DNA alignment by: (i) translating the DNA; (ii) aligning the resulting peptide sequences; and (iii) building a multiple DNA alignment by 'reverse translation' of the aligned protein sequences. In the resulting DNA alignment, gaps occur in groups of three corresponding to entire codons, and analogous codon positions are therefore always lined up. These features are useful when constructing multiple DNA alignments for phylogenetic analysis. RevTrans also accepts user-provided protein alignments for greater control of the alignment process. The RevTrans web server is freely available at http://www.cbs.dtu.dk/services/RevTrans/.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Amino Acid Substitution
Base Sequence
Codon
Internet
Mutation
Sequence Alignment / methods*
Sequence Analysis, DNA / methods*
Sequence Analysis, Protein / methods*
Software*

Substances

Codon