Inference with viral quasispecies diversity indices: clonal and NGS approaches

Bioinformatics. 2014 Apr 15;30(8):1104-1111. doi: 10.1093/bioinformatics/btt768. Epub 2014 Jan 2.


Given the inherent dynamics of a viral quasispecies, we are often interested in the comparison of diversity indices of sequential samples of a patient, or in the comparison of diversity indices of virus in groups of patients in a treated versus control design. It is then important to make sure that the diversity measures from each sample may be compared with no bias and within a consistent statistical framework. In the present report, we review some indices often used as measures for viral quasispecies complexity and provide means for statistical inference, applying procedures taken from the ecology field. In particular, we examine the Shannon entropy and the mutation frequency, and we discuss the appropriateness of different normalization methods of the Shannon entropy found in the literature. By taking amplicons ultra-deep pyrosequencing (UDPS) raw data as a surrogate of a real hepatitis C virus viral population, we study through in-silico sampling the statistical properties of these indices under two methods of viral quasispecies sampling, classical cloning followed by Sanger sequencing (CCSS) and next-generation sequencing (NGS) such as UDPS. We propose solutions specific to each of the two sampling methods-CCSS and NGS-to guarantee statistically conforming conclusions as free of bias as possible.

Contact: Supplementary information: Supplementary data are available at Bioinformatics online.

MeSH terms

  • Computational Biology
  • Genetic Variation*
  • Hepacivirus / genetics*
  • High-Throughput Nucleotide Sequencing*
  • RNA, Viral / genetics
  • Sequence Analysis, RNA


  • RNA, Viral