Prediction of Adulthood Obesity Using Genetic and Childhood Clinical Risk Factors in the Cardiovascular Risk in Young Finns Study

Circ Cardiovasc Genet. 2017 Jun;10(3):e001554. doi: 10.1161/CIRCGENETICS.116.001554.


Background: Obesity is a known risk factor for cardiovascular disease. Early prediction of obesity is essential for prevention. The aim of this study is to assess the use of childhood clinical factors and the genetic risk factors in predicting adulthood obesity using machine learning methods.

Methods and results: A total of 2262 participants from the Cardiovascular Risk in YFS (Young Finns Study) were followed up from childhood (age 3-18 years) to adulthood for 31 years. The data were divided into training (n=1625) and validation (n=637) set. The effect of known genetic risk factors (97 single-nucleotide polymorphisms) was investigated as a weighted genetic risk score of all 97 single-nucleotide polymorphisms (WGRS97) or a subset of 19 most significant single-nucleotide polymorphisms (WGRS19) using boosting machine learning technique. WGRS97 and WGRS19 were validated using external data (n=369) from BHS (Bogalusa Heart Study). WGRS19 improved the accuracy of predicting adulthood obesity in training (area under the curve [AUC=0.787 versus AUC=0.744, P<0.0001) and validation data (AUC=0.769 versus AUC=0.747, P=0.026). WGRS97 improved the accuracy in training (AUC=0.782 versus AUC=0.744, P<0.0001) but not in validation data (AUC=0.749 versus AUC=0.747, P=0.785). Higher WGRS19 associated with higher body mass index at 9 years and WGRS97 at 6 years. Replication in BHS confirmed our findings that WGRS19 and WGRS97 are associated with body mass index.

Conclusions: WGRS19 improves prediction of adulthood obesity. Predictive accuracy is highest among young children (3-6 years), whereas among older children (9-18 years) the risk can be identified using childhood clinical factors. The model is helpful in screening children with high risk of developing obesity.

Keywords: genetics; machine learning; obesity; risk factor; single-nucleotide polymorphism genetics; statistics.

MeSH terms

  • Adolescent
  • Adult
  • Area Under Curve
  • Body Mass Index
  • C-Reactive Protein / analysis
  • Carrier Proteins / genetics
  • Child
  • Child, Preschool
  • Female
  • Finland
  • Follow-Up Studies
  • Humans
  • Logistic Models
  • MAP Kinase Kinase 5 / genetics
  • Machine Learning
  • Male
  • Obesity / etiology*
  • Obesity / genetics
  • Odds Ratio
  • Polymorphism, Single Nucleotide
  • ROC Curve
  • Risk Factors
  • Transcription Factor AP-2 / genetics


  • Carrier Proteins
  • POC5 protein, human
  • TFAP2B protein, human
  • Transcription Factor AP-2
  • C-Reactive Protein
  • MAP Kinase Kinase 5
  • MAP2K5 protein, human