Learning the drug target-likeness of a protein

Proteomics. 2007 Dec;7(23):4255-63. doi: 10.1002/pmic.200700062.


Current drug discovery and development approaches rely extensively on the identification and validation of appropriate targets; for example, those with marketable and robust therapeutics. Wide-ranging efforts have been directed at this problem and various approaches have been developed to identify disease-associated genes as candidates. In this work, we show with statistical significance that successful drug targets, in addition to their linkage to disease, share common characteristics that are disease-independent. For example, marked differences in functional category, tissue specificity, and sequence variability are observed between known targets and average proteins. These results lead to an interesting hypothesis: potentially good drug targets shall have some desired properties, which we refer to as "drug target-likeness" that are beyond their disease-associations. Because of the limited availability of comprehensive protein characteristics data, we tried to learn the drug target-likeness property at the sequence level. Results show that a support vector machine model is able to accurately distinguish targets from nontargets entirely with sequence features. It is our hope that these encouraging results will invite future systematic proteomic scale experiments to gather necessary protein characteristics data for the accurate and predictive definition of "drug target-likeness", providing a new perspective toward understanding and pursuing effective therapeutics.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Amino Acid Sequence
  • Artificial Intelligence*
  • Computational Biology / methods*
  • Databases, Protein
  • Gene Expression
  • Genetic Variation
  • Humans
  • Models, Statistical*
  • Pharmaceutical Preparations / chemistry*
  • Pharmaceutical Preparations / metabolism
  • Protein Binding
  • Proteins / chemistry*
  • Proteins / genetics
  • Proteins / metabolism
  • ROC Curve
  • Reproducibility of Results


  • Pharmaceutical Preparations
  • Proteins