Prediction of candidate primary immunodeficiency disease genes using a support vector machine learning approach

DNA Res. 2009 Dec;16(6):345-51. doi: 10.1093/dnares/dsp019. Epub 2009 Oct 3.

Abstract

Screening and early identification of primary immunodeficiency disease (PID) genes is a major challenge for physicians. Many resources have catalogued molecular alterations in known PID genes along with their associated clinical and immunological phenotypes. However, these resources do not assist in identifying candidate PID genes. We have recently developed a platform designated Resource of Asian PDIs, which hosts information pertaining to molecular alterations, protein-protein interaction networks, mouse studies and microarray gene expression profiling of all known PID genes. Using this resource as a discovery tool, we describe the development of an algorithm for prediction of candidate PID genes. Using a support vector machine learning approach, we have predicted 1442 candidate PID genes using 69 binary features of 148 known PID genes and 3162 non-PID genes as a training data set. The power of this approach is illustrated by the fact that six of the predicted genes have recently been experimentally confirmed to be PID genes. The remaining genes in this predicted data set represent attractive candidates for testing in patients where the etiology cannot be ascribed to any of the known PID genes.

Publication types

  • Evaluation Study
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Asia
  • Computational Biology / methods*
  • Databases, Genetic
  • Genetic Predisposition to Disease
  • Humans
  • Immunologic Deficiency Syndromes / genetics*
  • Predictive Value of Tests
  • Proteins / genetics*
  • Proteins / metabolism
  • Sensitivity and Specificity

Substances

  • Proteins