A data science approach to candidate gene selection of pain regarded as a process of learning and neural plasticity

Pain. 2016 Dec;157(12):2747-2757. doi: 10.1097/j.pain.0000000000000694.


The increasing availability of "big data" enables novel research approaches to chronic pain while also requiring novel techniques for data mining and knowledge discovery. We used machine learning to combine the knowledge about n = 535 genes identified empirically as relevant to pain with the knowledge about the functions of thousands of genes. Starting from an accepted description of chronic pain as displaying systemic features described by the terms "learning" and "neuronal plasticity," a functional genomics analysis proposed that among the functions of the 535 "pain genes," the biological processes "learning or memory" (P = 8.6 × 10) and "nervous system development" (P = 2.4 × 10) are statistically significantly overrepresented as compared with the annotations to these processes expected by chance. After establishing that the hypothesized biological processes were among important functional genomics features of pain, a subset of n = 34 pain genes were found to be annotated with both Gene Ontology terms. Published empirical evidence supporting their involvement in chronic pain was identified for almost all these genes, including 1 gene identified in March 2016 as being involved in pain. By contrast, such evidence was virtually absent in a randomly selected set of 34 other human genes. Hence, the present computational functional genomics-based method can be used for candidate gene selection, providing an alternative to established methods.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Databases, Factual / statistics & numerical data*
  • Genetic Predisposition to Disease / genetics*
  • Genomics
  • Humans
  • Learning / physiology*
  • Machine Learning
  • Neuronal Plasticity / genetics*
  • Pain / genetics*