Statistical learning approaches in the genetic epidemiology of complex diseases

Anne-Laure Boulesteix; Marvin N Wright; Sabine Hoffmann; Inke R König

doi:10.1007/s00439-019-01996-9

Statistical learning approaches in the genetic epidemiology of complex diseases

Hum Genet. 2020 Jan;139(1):73-84. doi: 10.1007/s00439-019-01996-9. Epub 2019 May 2.

Authors

Anne-Laure Boulesteix¹, Marvin N Wright^{2

3}, Sabine Hoffmann⁴, Inke R König⁵

Affiliations

¹ Institute for Medical Information Processing, Biometry and Epidemiology, Ludwig-Maximilians-University, Munich, Germany. boulesteix@ibe.med.uni-muenchen.de.
² Leibniz Institute for Prevention Research and Epidemiology-BIPS, Bremen, Germany.
³ Section of Biostatistics, Department of Public Health, University of Copenhagen, Copenhagen, Denmark.
⁴ Institute for Medical Information Processing, Biometry and Epidemiology, Ludwig-Maximilians-University, Munich, Germany.
⁵ Institute of Medical Biometry and Statistics, University of Lübeck, Lübeck, Germany.

PMID: 31049651
DOI: 10.1007/s00439-019-01996-9

Abstract

In this paper, we give an overview of methodological issues related to the use of statistical learning approaches when analyzing high-dimensional genetic data. The focus is set on regression models and machine learning algorithms taking genetic variables as input and returning a classification or a prediction for the target variable of interest; for example, the present or future disease status, or the future course of a disease. After briefly explaining the basic motivation and principle of these methods, we review different procedures that can be used to evaluate the accuracy of the obtained models and discuss common flaws that may lead to over-optimistic conclusions with respect to their prediction performance and usefulness.

Keywords: Cross-validation; High-dimensional data; Omics data; Prognostic model; Regression; Validation.

Publication types

Review

MeSH terms

Algorithms*
Artificial Intelligence
Disease / genetics*
Humans
Machine Learning*
Models, Statistical*
Molecular Epidemiology*

Abstract

Publication types

MeSH terms

Grants and funding