Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Oct 2;1(1):vev012.
doi: 10.1093/ve/vev012. eCollection 2015.

CodonShuffle: a tool for generating and analyzing synonymously mutated sequences

Affiliations

CodonShuffle: a tool for generating and analyzing synonymously mutated sequences

Daniel Macedo de Melo Jorge et al. Virus Evol. .

Abstract

Because synonymous mutations do not change the amino acid sequence of a protein, they are generally considered to be selectively neutral. Empiric data suggest, however, that a significant fraction of viral mutational fitness effects may be attributable to synonymous mutation. Bias in synonymous codon usage in viruses may result from selection for translational efficiency, mutational bias, base pairing requirements in RNA structures, or even selection against specific dinucleotides by innate immune effectors. Experimental analyses of codon usage and genome evolution have been facilitated by advances in synthetic biology, which now make it feasible to generate viral genomes that contain large numbers of synonymous mutations. The generally pleiotropic effects of synonymous mutation on viral fitness have, at times, made it difficult to define the mechanistic basis for the observed attenuation of these heavily mutated viruses. We have addressed this problem by developing a bioinformatic tool for the generation and analysis of viral sequences with large-scale synonymous mutation. A variety of permutation strategies are applied to shuffle codons within an open reading frame. After measuring the dinucleotide frequency, codon usage, codon pair bias, and free energy of RNA folding for each permuted genome, we used z-score normalization and a least squares regression model to quantify their overall distance from the starting sequence. Using this approach, the user can easily identify a large number of synonymously mutated sequences with varying similarity to a wild-type genome across a range of nucleic-acid-based determinants of viral fitness. We believe that this tool will be useful in designing genomes for subsequent experimental studies of the fitness impacts of synonymous mutation.

Keywords: RNA virus; bioinformatics; codon; synonymous mutation; synthetic.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
(A) Diagram of permutation approaches, adapted from Belalov and Lukashev (2013). (B) Subtype representation of eighty-nine enteroviral (EV) full-length capsid sequences used as reference set for analyses in Figs 2–6 below.
Figure 2.
Figure 2.
Dinucleotide bias of permuted sequences. Distribution of least squares values (x axis, see text) for 1,000 permuted sequences generated from each of the four permutation scripts, indicated in top left of each panel. Values for permuted sequences are shown in green and values for the reference set of eighty-nine enteroviruses are shown in orange. Purple dashed line is the value for the wild-type poliovirus capsid (Type 1, Mahoney). Of the four permutation approaches, dN31 and dN231 had little to no effect on the dinucleotide bias. This is consistent with a previously observed bias in GC content in the third codon and first codon positions of poliovirus (Belalov and Lukashev 2013).
Figure 3.
Figure 3.
Codon bias of permuted sequences. Distribution of ENC values (x axis) for 1,000 permuted sequences generated from each of the four permutation scripts, indicated in top left of each panel. Values for permuted sequences are shown in green and values for the reference set of eighty-nine enteroviruses are shown in orange. Purple dashed line is the value for the wild-type poliovirus capsid (Type 1, Mahoney).
Figure 4.
Figure 4.
Codon bias of permuted sequences. Distribution of CAI values (x axis) for 1,000 permuted sequences generated from each of the four permutation scripts, indicated in top left of each panel. Values for permuted sequences are shown in green and values for the reference set of eighty-nine enteroviruses are shown in orange. Purple dashed line is the value for the wild-type poliovirus capsid (Type 1, Mahoney).
Figure 5.
Figure 5.
Codon pair bias of permuted sequences. Distribution of CPB values (x axis) for 1,000 permuted sequences generated from each of the four permutation scripts, indicated in top left of each panel. Values for permuted sequences are shown in green and values for the reference set of eighty-nine enteroviruses are shown in orange. Purple dashed line is the value for the wild-type poliovirus capsid (Type 1, Mahoney).
Figure 6.
Figure 6.
RNA structure in permuted sequences. Distribution of minimum free energy values (x axis) of 1,000 permuted sequences generated from each of the four permutation scripts, indicated in top left of each panel. Values for permuted sequences are shown in green and values for the reference set of eighty-nine enteroviruses are shown in orange. Purple dashed line is the value for the wild-type poliovirus capsid (Type 1, Mahoney).
Figure 7.
Figure 7.
Minimum free energy of permuted FMDV sequences. (A) RNA structure in permuted sequences. Distribution of minimum free energy values (x axis) of 1,000 permuted sequences generated from each of the four permutation scripts, indicated in top left of each panel. Values for permuted sequences are shown in green. Purple dashed line is the value for the wild-type FMDV capsid sequence (Genbank KF152935.1). (B) Sliding window analysis, 100 nucleotides with eighty nucleotide overlap, of local RNA structure in the FMDV capsid sequence for the wild type (left), and one of the permuted sequences (right).
Figure 8.
Figure 8.
Assessment of overall similarity using a least squares model. (A) Calculation of z-score for each sequence in each distribution (blue dotted line) and the delta z relative to wild type (red dotted line), Figs 2–6. (B) Plots of Hamming distance versus Least Squares Distance, D, for 1,000 permuted sequences generated from each of the four permutation scripts, indicated in top left of each panel. The Hamming distance is in nucleotides across the 2,643 base capsid sequence of poliovirus.
Figure 9.
Figure 9.
(A) Species (haplotype) accumulation curve with 10,000 sequences sampled (x axis) and number of unique haplotypes (y axis). Shown is the curve for sequences for the dN231 script. Curves for sequences generated with the three other scripts were identical. (B) Plots of Hamming distance versus least squares distance, D, for 10,000 permuted sequences generated from the dN231 script. Each sample of 1,000 sequences is shown in a different color. The Hamming distance is in nucleotides across the 2,643 base capsid sequence of poliovirus. (C) Neighbor-joining tree of ninety-eighty capsid sequences generated by dN231 with a D value of 0 (Figs 8 and 9B). Scale (no. nucleotides) is shown and the wild-type poliovirus sequence is indicated (red).

Similar articles

Cited by

References

    1. Belalov I. S., Lukashev A. N. (2013) ‘Causes and Implications of Codon Usage Bias in RNA Viruses’, PLoS One, 8: e56642. - PMC - PubMed
    1. Bull J. J., Molineux I. J., Wilke C. O. (2012) ‘Slow Fitness Recovery in a Codon-Modified Viral Genome’, Molecular Biology and Evolution, 29: 2997–3004. - PMC - PubMed
    1. Burns C. C., et al. (2006) ‘Modulation of Poliovirus Replicative Fitness in HeLa Cells by Deoptimization of Synonymous Codon Usage in the Capsid Region’, Journal of Virology, 80: 3259–72. - PMC - PubMed
    1. Burns C. C., et al. (2009) ‘Genetic Inactivation of Poliovirus Infectivity by Increasing the Frequencies of CpG and UpA Dinucleotides within and across Synonymous Capsid Region Codons’, Journal of Virology, 83: 9957–69. - PMC - PubMed
    1. Cello J., Paul A. V., Wimmer E. (2002) ‘Chemical Synthesis of Poliovirus cDNA: Generation of Infectious Virus in the Absence of Natural Template’, Science, 297: 1016–8. - PubMed

LinkOut - more resources