Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
, 14 (10), R115

DNA Methylation Age of Human Tissues and Cell Types

DNA Methylation Age of Human Tissues and Cell Types

Steve Horvath. Genome Biol.

Erratum in

Abstract

Background: It is not yet known whether DNA methylation levels can be used to accurately predict age across a broad spectrum of human tissues and cell types, nor whether the resulting age prediction is a biologically meaningful measure.

Results: I developed a multi-tissue predictor of age that allows one to estimate the DNA methylation age of most tissues and cell types. The predictor, which is freely available, was developed using 8,000 samples from 82 Illumina DNA methylation array datasets, encompassing 51 healthy tissues and cell types. I found that DNA methylation age has the following properties: first, it is close to zero for embryonic and induced pluripotent stem cells; second, it correlates with cell passage number; third, it gives rise to a highly heritable measure of age acceleration; and, fourth, it is applicable to chimpanzee tissues. Analysis of 6,000 cancer samples from 32 datasets showed that all of the considered 20 cancer types exhibit significant age acceleration, with an average of 36 years. Low age-acceleration of cancer tissue is associated with a high number of somatic mutations and TP53 mutations, while mutations in steroid receptors greatly accelerate DNA methylation age in breast cancer. Finally, I characterize the 353 CpG sites that together form an aging clock in terms of chromatin states and tissue variance.

Conclusions: I propose that DNA methylation age measures the cumulative effect of an epigenetic maintenance system. This novel epigenetic clock can be used to address a host of questions in developmental biology, cancer and aging research.

Figures

Figure 1
Figure 1
Chronological age (y-axis) versus DNAm age (x-axis) in the training data. Each point corresponds to a DNA methylation sample (human subject). Points are colored and labeled according to the underlying data set as described in Additional file 1. (A) Across all training data, the correlation between DNAm age (x-axis) and chronological age (y-axis) is 0.97 and the error (median absolute difference) is 2.9 years. Results for (B) peripheral blood mononuclear cells (cor = 0.97, error <1 year), (C) whole blood (cor = 0.98, error = 2.7 years), (D) cerebellum (cor = 0.92, error = 4.5), (E) pons (cor = 0.96, error = 3.3), (F) pre-frontal cortex (cor = 0.98, 1.4), (G) temporal cortex (cor = 0.99, error = 2.2), (H) brain samples, composed of 58 glial cell, 58 neuron cell, 20 bulk, and 9 mixed samples (cor = 0.94, error = 3.1), (I) normal breast tissue (cor = 0.73, error = 8.9), (J) buccal cells (cor = 0.95, error <1 year), (K) cartilage (cor = 0.79, error = 4), (L) colon (cor = 0.98, error = 3.7), (M) dermal fibroblasts (cor = 0.92, error = 12), (N) epidermis (cor = 0.96, error = 3.1), (O) gastric tissue (cor = 0.83, error = 5.3), (P) normal adjacent tissue from head/neck cancers (cor = 0.73, error = 5.8), (Q) heart (cor = 0.82, error = 9.2), (R) kidney (cor = 0.88, error = 3.8), (S) liver (cor = 0.90, error = 4.5), (T) lung (cor = 0.80, error = 3.1), (U) mesenchymal stromal cells (cor = 0.95, error = 5.2), (V) prostate (cor = 0.55, error = 4.2), (W) saliva (cor = 0.89, error = 2.9), (X) stomach (cor = 0.84, error = 3.7), (Y) thyroid (cor = 0.96, error = 4.1).
Figure 2
Figure 2
Chronological age (y-axis) versus DNAm age (x-axis) in the test data. (A) Across all test data, the age correlation is 0.96 and the error is 3.6 years. Results for (B) CD4 T cells measured at birth (age zero) and at age 1 (cor = 0.78, error = 0.27 years), (C) CD4 T cells and CD14 monocytes (cor = 0.90, error = 3.7), (D) peripheral blood mononuclear cells (cor = 0.96, error = 1.9), (E) whole blood (cor = 0.95, error = 3.7), (F) cerebellar samples (cor = 0.92, error = 5.9), (G) occipital cortex (cor = 0.98, error = 1.5), (H) normal adjacent breast tissue (cor = 0.87, error = 13), (I) buccal epithelium (cor = 0.83, error = 0.37), (J) colon (cor = 0.85, error = 5.6), (K) fat adipose (cor = 0.65, error = 2.7), (L) heart (cor = 0.77, error = 12), (M) kidney (cor = 0.86, error = 4.6), (N) liver (cor = 0.89, error = 6.7), (O) lung (cor = 0.87, error = 5.2), (P) muscle (cor = 0.70, error = 18), (Q) saliva (cor = 0.83, error = 2.7), (R) uterine cervix (cor = 0.75, error = 6.2), (S) uterine endometrium (cor = 0.55, 11), (T) various blood samples composed of 10 Epstein Barr Virus transformed B cell, three naive B cell, and three peripheral blood mononuclear cell samples (cor = 0.46, error = 4.4). Samples are colored by disease status: brown for Werner progeroid syndrome, blue for Hutchinson-Gilford progeria, and turquoise for healthy control subjects.
Figure 3
Figure 3
Factors affecting the relation between age and DNAm age. (A-C) Factors influencing prediction accuracy in the training and test sets. (A) The standard deviation of age (x-axis) has a strong relationship (cor = 0.49, P = 4E-5) with age correlation (y-axis). To arrive at an unbiased measure of prediction accuracy, I estimated the age correlation using a leave-one-data-set-out cross validation (LOOCV) analysis. Each point is labeled and colored according to the underlying data set (Additional file 1). (B) Sample size (x-axis) is not significantly correlated with the age correlation (y-axis). (C)  Mean DNAm age per tissue (x-axis) versus mean chronological age (y-axis). Points correspond to the human tissue data mentioned in Additional file 1. Breast tissue shows signs of accelerated aging. (D,E) The effect of tissue type on the age prediction in test data set 71 even for tissues that were not part of the training data (for example, esophagus, jejunum, penis). (E) The horizontal bars report the DNAm age (x-axis) of a single tissue from a single donor (H12817). Only one sample per tissue (grey axis numbers) was available. DNAm age has a low coefficient of variation (0.12). The red vertical line corresponds to the true chronological age. (F-H) DNAm age for various tissues from data set 77 but chronological age was not available. (F,G) A multi-tissue analysis of somatic adult tissue data from an adult male and an adult female, respectively. (H) Neonatal tissues tend to have low DNAm age. (I,J) The DNAm age of sperm is significantly lower than the chronological age of the respective sperm donors in data sets 74 and 75, respectively. Error bars represent one standard error.
Figure 4
Figure 4
Studying the conservation of DNAm age in tissues from great apes. Analysis of two independent data sets involving tissues from great apes. (A,B) Results for data set 72 [27]. A high age correlation (cor = 0.84, error = 10 years) can be observed when studying both chimpanzee heart (colored grey) and human heart tissue (colored turquoise) samples. To facilitate a comparison, I also added the heart tissue data from data set 25 (blue circles). (B) DNAm age is closely related to chronological age (cor = 0.75, error = 3.7) across kidney and liver samples from humans (turquoise) and chimpanzees (grey). (C-F) Results for ape blood samples from data set 73. (C) Highly accurate results (cor = 0.9, error = 1.4) can be observed for blood samples from common chimpanzees (Pan troglodytes; labeled C, colored blue) and bonobos (Pan paniscus; labeled B, colored turquoise). (D) Results for common chimpanzees only. (E) Results for bonobos only. (F) Results for gorillas.
Figure 5
Figure 5
Induced pluripotent stem cells, embryonic stem cells and cell passaging. (A-C) Induced pluripotent stem (iPS) cells have a lower DNAm age than corresponding primary cells in (A) data set 77 (Kruskal Wallis P-value 1E-14), (B) data set 78 (P = 8E-10), and (C) data set 79 (P = 0.0062). (A,B) There is no significant difference in DNAm age between ES cells and iPS cells (both restricted to cell passage numbers less than 15) in data sets 77 and 78, respectively. (D,E) DNAm age of human ES cell lines and adult tissues in data sets 80 and 81, respectively. (F-J) Cell passage number (y-axis) is significantly correlated with DNAm age (x-axis). (F) Cell passage number (y-axis) versus DNAm age in data set 77. Points are colored by cell type (black for ES cells, red for iPS cells, blue for somatic cells). (G,H) Analogous results for iPS cells (cor = 0.33, P = 0.025) and embryonic stem cells (cor = 0.28, P = 0.0023) from data set 77. (I,J) Validation of these findings in two independent data sets, 78 and 79, respectively. Panel (J) involves only stem cells. Panels (A-C) involve cells that had undergone fewer than 15 cell passages. Panels (C,J) are restricted to cells that were not irradiated. The bar plots show the mean value ±1 standard error.
Figure 6
Figure 6
Heat map of DNA methylation levels of the 353 CpGs across all samples. (A) The heat map color-codes DNAm levels: blue and red for beta values close to zero and one, respectively. Note that DNA methylation levels only change very gradually with age. The 353 clock CpGs (rows) are sorted according to their age correlation. The first row color band, denoted 'corAge’, color-codes whether a CpG has a negative (blue) or positive (red) correlation with age. 'CpG’ indicates whether a CpG is located in a CpG island (turquoise), shore (brown), or outside of CpG islands. 'PolyGr’: blue for CpGs near a Polycomb group target gene. 'Chr’ color-codes chromosomes. The DNA methylation samples (columns) for which chronological age was available are sorted according to age, tissue, and data set. The column color bands visualize properties of the samples. 'Age’: white for age zero and dark brown for the maximum observed age of 101 years. 'Training’: black for training set samples. 'Tissue’ color codes tissue type. 'Platform’: black for Illumina 450K. Note that few data sets have a pronounced effect on the clock CpGs. The largest vertical band corresponds to the buccal epithelium samples from 15 year old subjects (data set 14, color-coded midnight blue in the column band 'Data’). (B) The weighted average of the 353 clock CpGs versus chronological age in the training data sets. The rate of change of the red curve can be interpreted as tick rate. Points are colored and labeled by data set. (C) Analogous results for the test data sets.
Figure 7
Figure 7
Age acceleration versus number of somatic mutations in the TCGA data. Mutation data from TCGA were used to count the number of mutations per cancer sample. (A) Age acceleration versus (log transformed) mutation count per sample across all cancers. Note that this analysis is confounded by cancer/tissue type. (B-P) A significant negative relationship between age acceleration and number of somatic mutations can be observed in the following seven affected tissues/cancers: (C) bone marrow (AML), (D) breast carcinoma (BRCA), (G) kidney (KIRC), (H) kidney (KIRP), (K) ovarian cancer (OVAR), (L) prostate (PRAD), and (O) thyroid (THCA). No significant relationship could be found in the following six cancer types: (F) colon carcinoma (COAD), (I) lung adenocarcinoma (LUAD), (J) lung squamous cell carcinoma (LUSC), (P) uterine endometrioid, (M) rectal cancer (READ), (N) skin. Due to the low sample size, the results are inconclusive for (B) bladder cancer and (E) cervical cancer. Each point corresponds to a DNA methylation sample (cancer sample from a human subject) analogous to Additional file 12. The x-axis reports the log transformed (base 10) number of mutations observed per sample. The figure titles report the biweight midcorrelation, which is a robust measure of correlation.
Figure 8
Figure 8
Age acceleration in breast cancer. Panels in the first column (A,E,I,M) show that estrogen receptor (ER)-positive breast cancer samples have increased age acceleration in four independent data sets. Panels in the second column (B,F,J) show the same result for progesterone receptor (PR)-positive cancers. Panels in the third column (C,G,K) show that HER2/neu amplification is not associated with age acceleration. Panels in the fourth column (D,H,L) show how combinations of these genomic aberrations affect age acceleration. (N) Age acceleration across the following breast cancer types: Basal-like, HER2-type, luminal A, luminal B, and healthy (normal) breast tissue. (O) Ki-67 expression versus age acceleration. (P) Tumor grade is not significantly related to age accelerations, reflecting results from Additional file 14. Vertical grey numbers on the x-axis report sample sizes. The figure titles report the data source (GSE identifier from Gene Expression Omnibus or TCGA), and the Kruskal Wallis test P-value (except for panels (O,P), which report correlation test P-values). Error bars represent 1 standard error.
Figure 9
Figure 9
Age acceleration in colorectal cancer, glioblastoma multiforme and acute myeloid leukemia. (A-F) Results for colorectal cancer. Mean age acceleration (y-axis) in colorectal cancer versus mutation status (denoted by a plus sign) in (A)BRAF, (B)TP53, (C)K-RAS. (D) Promoter hyper methylation of the mismatch repair gene MLH1 (denoted by a plus sign) is significantly (P = 5.7E-5) associated with age acceleration. (E) Mean age acceleration across different patient groups defined by combinations of BRAF, TP53, K-RAS, MLH1 status. The first bar reports the age acceleration in normal adjacent colorectal tissue from cancer patients but the sample size of 4 is rather low. (F) CpG island methylator phenotype is associated with age acceleration (P = 3.5E-5). (G-R) Results for various genomic abnormalities in glioblastoma multiforme. (J) A highly significant (P = 3.3E-7) relationship can be found between H3F3A mutations and age acceleration. Samples with a G34R mutation have the highest age acceleration. (S-W) Results for various genomic aberrations in acute myeloid leukemia. (X) Thyroid cancer age acceleration versus RAS family mutation status is inconclusive since mutation status was largely unknown. Error bars represent 1 standard error.

Similar articles

See all similar articles

Cited by 949 PubMed Central articles

See all "Cited by" articles

References

    1. Oberdoerffer P, Sinclair DA. The role of nuclear architecture in genomic instability and ageing. Nat Rev Mol Cell Biol. 2007;14:692–702. doi: 10.1038/nrm2238. - DOI - PubMed
    1. Campisi J, Vijg J. Does damage to DNA and other macromolecules play a role in aging? If so, how? J Gerontol A Biol Sci Med Sci. 2009;14:175–178. doi: 10.1093/gerona/gln065. - DOI - PMC - PubMed
    1. Berdyshev G, Korotaev G, Boiarskikh G, Vaniushin B. Nucleotide composition of DNA and RNA from somatic tissues of humpback and its changes during spawning. Biokhimiia. 1967;14:88–993. - PubMed
    1. Vanyushin B, Nemirovsky L, Klimenko V, Vasiliev V, Belozersky A. The 5 mehylcytosine in DNA of rats. Tissue and age specificity and the changes induced by hydrocortisone and other agents. Gerontologia. 1973;14:138–152. doi: 10.1159/000211967. - DOI - PubMed
    1. Wilson V, Smith R, Ma S, Cutler R. Genomic 5-methyldeoxycytidine decreases with age. J Biol Chem. 1987;14:9948–9951. - PubMed

MeSH terms

Feedback