DNA methylation-based age prediction using massively parallel sequencing data and multiple machine learning models

Forensic Sci Int Genet. 2018 Nov:37:215-226. doi: 10.1016/j.fsigen.2018.09.003. Epub 2018 Sep 8.

Abstract

The field of DNA intelligence focuses on retrieving information from DNA evidence that can help narrow down large groups of suspects or define target groups of interest. With recent breakthroughs on the estimation of geographical ancestry and physical appearance, the estimation of chronological age comes to complete this circle of information. Recent studies have identified methylation sites in the human genome that correlate strongly with age and can be used for the development of age-estimation algorithms. In this study, 110 whole blood samples from individuals aged 11-93 years were analysed using a DNA methylation quantification assay based on bisulphite conversion and massively parallel sequencing (Illumina MiSeq) of 12 CpG sites. Using this data, 17 different statistical modelling approaches were compared based on root mean square error (RMSE) and a Support Vector Machine with polynomial function (SVMp) model was selected for further testing. For the selected model (RMSE = 4.9 years) the mean average error (MAE) of the blind test (n = 33) was calculated at 4.1 years, with 52% of the samples predicting with less than 4 years of error and 86% with less than 7 years. Furthermore, the sensitivity of the method was assessed both in terms of methylation quantification accuracy and prediction accuracy in the first validation of this kind. The described method retained its accuracy down to 10 ng of initial DNA input or ∼2 ng bisulphite PCR input. Finally, 34 saliva samples were analysed and following basic normalisation, the chronological age of the donors was predicted with less than 4 years of error for 50% of the samples and with less than 7 years of error for 70%.

Keywords: Age prediction; Artificial neural networks; DNA methylation; Machine learning; Saliva; Sperm; Whole blood.

MeSH terms

  • Adolescent
  • Adult
  • Aged
  • Aged, 80 and over
  • Aging / genetics*
  • Blood Chemical Analysis
  • Child
  • CpG Islands / genetics
  • DNA Methylation*
  • High-Throughput Nucleotide Sequencing*
  • Humans
  • Male
  • Middle Aged
  • Models, Statistical
  • Multiplex Polymerase Chain Reaction
  • Neural Networks, Computer
  • Reproducibility of Results
  • Saliva / chemistry
  • Semen / chemistry
  • Sequence Analysis, DNA
  • Sulfites
  • Support Vector Machine
  • Young Adult

Substances

  • Sulfites
  • sodium bisulfite