Selection analyses of paired HIV-1 gag and gp41 sequences obtained before and after antiretroviral therapy

Sci Data. 2018 Jul 24;5:180147. doi: 10.1038/sdata.2018.147.


Most HIV-1-infected individuals with virological failure on a pharmacologically-boosted protease inhibitor (PI) regimen do not develop PI-resistance protease mutations. One proposed explanation is that HIV-1 gag or gp41 cytoplasmic domain mutations might also reduce PI susceptibility. In a recent study of paired gag and gp41 sequences from individuals with virological failure on a PI regimen, we did not identify PI-selected mutations and concluded that if such mutations existed, larger numbers of paired sequences from multiple studies would be needed for their identification. In this study, we generated site-specific amino acid profiles using gag and gp41 published sequences from 5,338 and 4,242 ART-naïve individuals, respectively, to assist researchers identify unusual mutations arising during therapy and to provide scripts for performing established and novel maximal likelihood estimates of dN/dS substitution rates in paired sequences. The pipelines used to generate the curated sequences, amino acid profiles, and dN/dS analyses will facilitate the application of consistent methods to paired gag and gp41 sequence datasets and expedite the identification of potential sites under PI-selection pressure.

Publication types

  • Dataset
  • Research Support, N.I.H., Extramural

MeSH terms

  • Anti-HIV Agents / therapeutic use*
  • HIV Envelope Protein gp41 / genetics*
  • HIV Infections / therapy*
  • HIV Infections / virology*
  • HIV-1 / drug effects
  • HIV-1 / genetics*
  • Humans
  • Mutation
  • Sequence Analysis, Protein
  • Sequence Analysis, RNA
  • gag Gene Products, Human Immunodeficiency Virus / genetics*


  • Anti-HIV Agents
  • HIV Envelope Protein gp41
  • gag Gene Products, Human Immunodeficiency Virus
  • gp41 protein, Human immunodeficiency virus 1

Associated data

  • Dryad/10.5061/dryad.71b5t