Comparing enrichment analysis and machine learning for identifying gene properties that discriminate between gene classes
- PMID: 30895300
- DOI: 10.1093/bib/bbz028
Comparing enrichment analysis and machine learning for identifying gene properties that discriminate between gene classes
Abstract
Biologists very often use enrichment methods based on statistical hypothesis tests to identify gene properties that are significantly over-represented in a given set of genes of interest, by comparison with a 'background' set of genes. These enrichment methods, although based on rigorous statistical foundations, are not always the best single option to identify patterns in biological data. In many cases, one can also use classification algorithms from the machine-learning field. Unlike enrichment methods, classification algorithms are designed to maximize measures of predictive performance and are capable of analysing combinations of gene properties, instead of one property at a time. In practice, however, the majority of studies use either enrichment or classification methods (rather than both), and there is a lack of literature discussing the pros and cons of both types of method. The goal of this paper is to compare and contrast enrichment and classification methods, offering two contributions. First, we discuss the (to some extent complementary) advantages and disadvantages of both types of methods for identifying gene properties that discriminate between gene classes. Second, we provide a set of high-level recommendations for using enrichment and classification methods. Overall, by highlighting the strengths and the weaknesses of both types of methods we argue that both should be used in bioinformatics analyses.
Keywords: classification; enrichment analysis; machine learning; statistical hypothesis testing.
© The Author(s) 2019. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Similar articles
-
Prior biological knowledge-based approaches for the analysis of genome-wide expression profiles using gene sets and pathways.Stat Methods Med Res. 2009 Dec;18(6):577-93. doi: 10.1177/0962280209351925. Stat Methods Med Res. 2009. PMID: 20048386 Free PMC article.
-
Exploring Genome-Wide Expression Profiles Using Machine Learning Techniques.Methods Mol Biol. 2017;1537:347-364. doi: 10.1007/978-1-4939-6685-1_20. Methods Mol Biol. 2017. PMID: 27924604 Free PMC article.
-
Regularized Non-Negative Matrix Factorization for Identifying Differentially Expressed Genes and Clustering Samples: A Survey.IEEE/ACM Trans Comput Biol Bioinform. 2018 May-Jun;15(3):974-987. doi: 10.1109/TCBB.2017.2665557. Epub 2017 Feb 7. IEEE/ACM Trans Comput Biol Bioinform. 2018. PMID: 28186906
-
Design and analysis of classifier learning experiments in bioinformatics: survey and case studies.IEEE/ACM Trans Comput Biol Bioinform. 2012 Nov-Dec;9(6):1663-75. doi: 10.1109/TCBB.2012.117. IEEE/ACM Trans Comput Biol Bioinform. 2012. PMID: 22908127 Review.
-
Translational Metabolomics of Head Injury: Exploring Dysfunctional Cerebral Metabolism with Ex Vivo NMR Spectroscopy-Based Metabolite Quantification.In: Kobeissy FH, editor. Brain Neurotrauma: Molecular, Neuropsychological, and Rehabilitation Aspects. Boca Raton (FL): CRC Press/Taylor & Francis; 2015. Chapter 25. In: Kobeissy FH, editor. Brain Neurotrauma: Molecular, Neuropsychological, and Rehabilitation Aspects. Boca Raton (FL): CRC Press/Taylor & Francis; 2015. Chapter 25. PMID: 26269925 Free Books & Documents. Review.
Cited by
-
Identification of CXC Chemokine Receptor 2 (CXCR2) as a Novel Eosinophils-Independent Diagnostic Biomarker of Pediatric Eosinophilic Esophagitis by Integrated Bioinformatic and Machine-Learning Analysis.Immunotargets Ther. 2024 Feb 2;13:55-74. doi: 10.2147/ITT.S439289. eCollection 2024. Immunotargets Ther. 2024. PMID: 38328342 Free PMC article.
-
Identification and Preliminary Clinical Validation of Key Extracellular Proteins as the Potential Biomarkers in Hashimoto's Thyroiditis by Comprehensive Analysis.Biomedicines. 2023 Nov 24;11(12):3127. doi: 10.3390/biomedicines11123127. Biomedicines. 2023. PMID: 38137348 Free PMC article.
-
Combined bulk RNA and single-cell RNA analyses reveal TXNL4A as a new biomarker for hepatocellular carcinoma.Front Oncol. 2023 May 25;13:1202732. doi: 10.3389/fonc.2023.1202732. eCollection 2023. Front Oncol. 2023. PMID: 37305572 Free PMC article.
-
MMP1 acts as a potential regulator of tumor progression and dedifferentiation in papillary thyroid cancer.Front Oncol. 2022 Nov 21;12:1030590. doi: 10.3389/fonc.2022.1030590. eCollection 2022. Front Oncol. 2022. PMID: 36479070 Free PMC article.
-
MiR-33a targets FOSL1 and EN2 as a clinical prognostic marker for sarcopenia by glioma.Front Genet. 2022 Aug 17;13:953580. doi: 10.3389/fgene.2022.953580. eCollection 2022. Front Genet. 2022. PMID: 36061185 Free PMC article.
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
Miscellaneous
