Significant biases of dinucleotide composition in many RNA viruses including influenza A virus have been reported in recent years. Previous studies have showed that a codon-usage-altered influenza mutant with elevated CpG usage is attenuated in mammalian in vitro and in vivo models. However, the relationship between dinucleotide preference and codon usage bias is not entirely clear and changes in dinucleotide usage of influenza virus during evolution at segment level are yet to be investigated. In this study, a Monte Carlo type method was applied to identify under-represented or over-represented dinucleotide motifs, among different segments and different groups, in influenza viral sequences. After excluding the potential biases caused by codon usage and amino acid sequences, CpG and UpA were found under-represented in all viral segments from all groups, whereas UpG and CpA were found over-represented. We further explored the temporal changes of usage of these dinucleotides. Our analyses revealed significant decrease of CpG frequency in Segments 1, 3, 4, and 5 in seasonal H1 virus after its re-emergence in humans in 1977. Such temporal variations were mainly contributed by the dinucleotide changes at the codon positions 3-1 and 2-3 where silent mutations played a major role. The depletions of CpG and UpA through silent mutations consequently led to over-representations of UpG and CpA. We also found that dinucleotide preference directly results in significant synonymous codon usage bias. Our study helps to provide details on understanding the evolutionary history of influenza virus and selection pressures that shape the virus genome.
Keywords: codon usage; dinucleotide usage; evolution; influenza.
© The Author(s) 2019. Published by Oxford University Press.