Maintaining the integrity of human immunodeficiency virus sequence databases

J Virol. 1996 Aug;70(8):5720-30. doi: 10.1128/JVI.70.8.5720-5730.1996.


Human immunodeficiency virus type 1 (HIV-1) sequences are accumulating in the literature at a rapid pace. For this ever-expanding resource to be maximally useful, it is critical that researchers strive to maintain a high level of quality assurance, both in experimental design and conduct and in analyses. Here we present detailed analyses of problematic sets of HIV-1 sequences in the database that include sequence anomalies suggestive of mislabeling or sample contamination problems. These data are examined in the context of currently available HIV-1 sequence information to provide an example of how to identify potentially flawed data. Indicators of potential problems with sequences are (i) sequences that are nearly identical that are supposed to be derived from unlinked individuals and that are markedly distinct from other sequences from the putative source or (ii) sequences that are nearly identical to those of laboratory strains. We provide an outline of methods that researchers can use to perform preliminary laboratory and computational analyses that could help identify problematic data and thus help ensure the integrity of sequence databases.

Publication types

  • Research Support, U.S. Gov't, P.H.S.

MeSH terms

  • Base Sequence
  • Databases, Factual*
  • Gene Library*
  • HIV-1 / genetics*
  • Humans
  • Molecular Sequence Data
  • Phylogeny
  • Sequence Alignment