Predicting age by mining electronic medical records with deep learning characterizes differences between chronological and physiological age

J Biomed Inform. 2017 Dec:76:59-68. doi: 10.1016/j.jbi.2017.11.003. Epub 2017 Nov 4.


Determining the discrepancy between chronological and physiological age of patients is central to preventative and personalized care. Electronic medical records (EMR) provide rich information about the patient physiological state, but it is unclear whether such information can be predictive of chronological age. Here we present a deep learning model that uses vital signs and lab tests contained within the EMR of Mount Sinai Health System (MSHS) to predict chronological age. The model is trained on 377,686 EMR from patients of ages 18-85 years old. The discrepancy between the predicted and real chronological age is then used as a proxy to estimate physiological age. Overall, the model can predict the chronological age of patients with a standard deviation error of ∼7 years. The ages of the youngest and oldest patients were more accurately predicted, while patients of ages ranging between 40 and 60 years were the least accurately predicted. Patients with the largest discrepancy between their physiological and chronological age were further inspected. The patients predicted to be significantly older than their chronological age have higher systolic blood pressure, higher cholesterol, damaged liver, and anemia. In contrast, patients predicted to be younger than their chronological age have lower blood pressure and shorter stature among other indicators; both groups display lower weight than the population average. Using information from ∼10,000 patients from the entire cohort who have been also profiled with SNP arrays, genome-wide association study (GWAS) uncovers several novel genetic variants associated with aging. In particular, significant variants were mapped to genes known to be associated with inflammation, hypertension, lipid metabolism, height, and increased lifespan in mice. Several genes with missense mutations were identified as novel candidate aging genes. In conclusion, we demonstrate how EMR data can be used to assess overall health via a scale that is based on deviation from the patient's predicted chronological age.

Keywords: Age prediction; Aging; Deep learning; Machine learning; Medical records.

MeSH terms

  • Adolescent
  • Adult
  • Age Factors*
  • Aged
  • Aged, 80 and over
  • Cohort Studies
  • Data Mining*
  • Electronic Health Records*
  • Female
  • Genetic Predisposition to Disease
  • Genome-Wide Association Study
  • Humans
  • Learning*
  • Male
  • Middle Aged
  • Polymorphism, Single Nucleotide
  • Young Adult