Computational analysis of HIV-1 protease protein binding pockets

Gene M Ko; A Srinivas Reddy; Sunil Kumar; Barbara A Bailey; Rajni Garg

doi:10.1021/ci100200u

Computational analysis of HIV-1 protease protein binding pockets

J Chem Inf Model. 2010 Oct 25;50(10):1759-71. doi: 10.1021/ci100200u.

Authors

Gene M Ko¹, A Srinivas Reddy, Sunil Kumar, Barbara A Bailey, Rajni Garg

Affiliation

¹ Computational Science Research Center, San Diego State University, San Diego, California, USA.

Abstract

Mutations that arise in HIV-1 protease after exposure to various HIV-1 protease inhibitors have proved to be a difficult aspect in the treatment of HIV. Mutations in the binding pocket of the protease can prevent the protease inhibitor from binding to the protein effectively. In the present study, the crystal structures of 68 HIV-1 proteases complexed with one of the nine FDA approved protease inhibitors from the Protein Data Bank (PDB) were analyzed by (a) identifying the mutational changes with the aid of a developed mutation map and (b) correlating the structure of the binding pockets with the complexed inhibitors. The mutations of each crystal structure were identified by comparing the amino acid sequence of each structure against the HIV-1 wild-type strain HXB2. These mutations were visually presented in the form of a mutation map to analyze mutation patterns corresponding to each protease inhibitor. The crystal structure mutation patterns of each inhibitor (in vitro) were compared against the mutation patterns observed in in vivo data. The in vitro mutation patterns were found to be representative of most of the major in vivo mutations. We then performed a data mining analysis of the binding pockets from each crystal structure in terms of their chemical descriptors to identify important structural features of the HIV-1 protease protein with respect to the binding conformation of the HIV-1 protease inhibitors. Data mining analysis is performed using several classification techniques: Random Forest (RF), linear discriminant analysis (LDA), and logistic regression (LR). We developed two hybrid models, RF-LDA and RF-LR. Random Forest is used as a feature selection proxy, reducing the descriptor space to a few of the most relevant descriptors determined by the classifier. These descriptors are then used to develop the subsequent LDA, LR, and hierarchical classification models. Clustering analysis of the binding pockets using the selected descriptors used to produce the optimal classification models reveals conformational similarities of the ligands in each cluster. This study provides important information in understanding the structural features of HIV-1 protease which cannot be studied by other existing in vivo genomic data sets.

Publication types

Research Support, N.I.H., Extramural

MeSH terms

Amino Acid Sequence
Binding Sites
Computer Simulation
Crystallography, X-Ray
Data Mining
HIV Infections / drug therapy
HIV Infections / enzymology
HIV Infections / genetics
HIV Protease / chemistry*
HIV Protease / genetics*
HIV Protease / metabolism
HIV Protease Inhibitors / chemistry
HIV Protease Inhibitors / pharmacology*
HIV-1 / chemistry
HIV-1 / enzymology*
HIV-1 / genetics
Humans
Models, Molecular
Molecular Sequence Data
Mutation*
Protein Binding
Protein Conformation

Substances

HIV Protease Inhibitors
HIV Protease
p16 protease, Human immunodeficiency virus 1

Abstract

Publication types

MeSH terms

Substances

Grants and funding