Prediction of disease-associated functional variants in noncoding regions through a comprehensive analysis by integrating datasets and features

Hum Mutat. 2021 Jun;42(6):667-684. doi: 10.1002/humu.24203. Epub 2021 Apr 23.

Abstract

One of the greatest challenges in human genetics is deciphering the link between functional variants in noncoding sequences and the pathophysiology of complex diseases. To address this issue, many methods have been developed to sort functional single-nucleotide variants (SNVs) for neutral SNVs in noncoding regions. In this study, we integrated well-established features and commonly used datasets and merged them into large-scale datasets based on a random forest model, which yielded promising performance and outperformed some cutting-edge approaches. Our analyses of feature importance and data coverage also provide certain clues for future research in enhancing the prediction of functional noncoding SNVs.

Keywords: complex diseases; feature importance; functional variants; noncoding regions; random forest.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Computational Biology / methods*
  • Computer Simulation
  • Databases, Genetic
  • Datasets as Topic
  • Disease / genetics*
  • Genetic Predisposition to Disease / genetics
  • Genetic Testing / methods
  • Humans
  • Polymorphism, Single Nucleotide
  • RNA, Untranslated / genetics*
  • Reproducibility of Results
  • Sensitivity and Specificity
  • Software Design

Substances

  • RNA, Untranslated