Data Mining and Pattern Recognition Models for Identifying Inherited Diseases: Challenges and Implications

Front Genet. 2016 Aug 10;7:136. doi: 10.3389/fgene.2016.00136. eCollection 2016.


Data mining and pattern recognition methods reveal interesting findings in genetic studies, especially on how the genetic makeup is associated with inherited diseases. Although researchers have proposed various data mining models for biomedical approaches, there remains a challenge in accurately prioritizing the single nucleotide polymorphisms (SNP) associated with the disease. In this commentary, we review the state-of-art data mining and pattern recognition models for identifying inherited diseases and deliberate the need of binary classification- and scoring-based prioritization methods in determining causal variants. While we discuss the pros and cons associated with these methods known, we argue that the gene prioritization methods and the protein interaction (PPI) methods in conjunction with the K nearest neighbors' could be used in accurately categorizing the genetic factors in disease causation.

Keywords: data mining; inherited diseases; machine learning; protein-protein interaction; single nucleotide polymorphism.

Publication types

  • Review