Predicting HIV coreceptor usage on the basis of genetic and clinical covariates

Antivir Ther. 2007;12(7):1097-106.


Background: We compared several statistical learning methods for the prediction of HIV coreceptor use from clonal HIV third hypervariable (V3) loop sequences, and evaluated and improved their effectiveness on clinical samples.

Methods: Support vector machines (SVM), artificial neural networks, position-specific scoring matrices (PSSM) and mixtures of localized rules were estimated and tested using 10x ten-fold cross-validation on a clonal dataset consisting of 1,100 matched clonal genotype-phenotype pairs from 332 patients. Different SVMs were also trained and tested on a clinically derived dataset, representing 920 patient samples from British Columbia, Canada. Methods were evaluated using receiver operating characteristic (ROC) curves.

Results: In the clonal analysis, the sensitivity of the 11/25 rule at 92.5% specificity was 59.5%. PSSMs and SVMs increased sensitivity to 71.9% and 76.4%, respectively, at the same specificity (P < < 0.05). In clinical samples, the sensitivity of the 11/25 rule and SVM decreased to 25.9% (specificity 93.9%) and 39.8% (specificity 93.5%), respectively. However, the integration of clinical data resulted in a further 2.4-fold increase in sensitivity over the 11/25 rule (63%). Univariate analyses identified 41 V3 mutations significantly associated with coreceptor usage.

Conclusion: For all methods tested, a substantial sensitivity decrease is observed on clinical data, probably owing to the heterogeneity of the viral population in vivo. In response to these complications, we present an SVM-based approach that integrates sequence information with clinical and host data, resulting in improved performance and sensitivity compared with purely sequence-based approaches.

Publication types

  • Evaluation Study
  • Research Support, Non-U.S. Gov't

MeSH terms

  • CD4 Lymphocyte Count
  • Genotype
  • HIV / genetics*
  • HIV / metabolism*
  • HIV Envelope Protein gp120 / genetics
  • HIV Infections / virology*
  • Humans
  • Models, Statistical*
  • Neural Networks, Computer
  • Peptide Fragments / genetics
  • Phenotype
  • Receptors, CCR5 / metabolism*
  • Receptors, CXCR4 / metabolism*
  • Reproducibility of Results
  • Sensitivity and Specificity
  • Sequence Alignment
  • Viral Load


  • HIV Envelope Protein gp120
  • HIV envelope protein gp120 (305-321)
  • Peptide Fragments
  • Receptors, CCR5
  • Receptors, CXCR4