A classification model for G-to-A hypermutation in hepatitis B virus ultra-deep pyrosequencing reads

Bioinformatics. 2010 Dec 1;26(23):2929-32. doi: 10.1093/bioinformatics/btq570. Epub 2010 Oct 11.


Motivation: G → A hypermutation is an innate antiviral defense mechanism, mediated by host enzymes, which leads to the mutational impairment of viruses. Sensitive and specific identification of host-mediated G → A hypermutation is a novel sequence analysis challenge, particularly for viral deep sequencing studies. For example, two of the most common hepatitis B virus (HBV) reverse transcriptase (RT) drug-resistance mutations, A181T and M204I, arise from G → A changes and are routinely detected as low-abundance variants in nearly all HBV deep sequencing samples.

Results: We developed a classification model using measures of G → A excess and predicted indicators of lethal mutation and applied this model to 325 920 unique deep sequencing reads from plasma virus samples from 45 drug treatment-naïve HBV-infected individuals. The 2.9% of sequence reads that were classified as hypermutated by our model included most of the reads with A181T and/or M204I, indicating the usefulness of this model for distinguishing viral adaptive changes from host-mediated viral editing.

Availability: Source code and sequence data are available at http://hivdb.stanford.edu/pages/resources.html.

Contact: ereuman@stanfordalumni.org

Supplementary information: Supplementary data are available at Bioinformatics online.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Adenine / analysis
  • Algorithms
  • Classification / methods
  • DNA Mutational Analysis / methods*
  • Drug Resistance, Viral / genetics
  • Guanine / analysis
  • Hepatitis B / virology
  • Hepatitis B virus / genetics*
  • High-Throughput Nucleotide Sequencing
  • Humans
  • Models, Statistical
  • Mutation


  • Guanine
  • Adenine