Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Jul;559(7714):400-404.
doi: 10.1038/s41586-018-0317-6. Epub 2018 Jul 9.

Prediction of Acute Myeloid Leukaemia Risk in Healthy Individuals

Sagi Abelson  1 Grace Collord  2   3 Stanley W K Ng  4 Omer Weissbrod  5 Netta Mendelson Cohen  5 Elisabeth Niemeyer  6 Noam Barda  7 Philip C Zuzarte  8 Lawrence Heisler  8 Yogi Sundaravadanam  8 Robert Luben  9 Shabina Hayat  9 Ting Ting Wang  1   10 Zhen Zhao  1 Iulia Cirlan  1 Trevor J Pugh  1   8   10 David Soave  8 Karen Ng  8 Calli Latimer  2 Claire Hardy  2 Keiran Raine  2 David Jones  2 Diana Hoult  11 Abigail Britten  11 John D McPherson  8 Mattias Johansson  12 Faridah Mbabaali  8 Jenna Eagles  8 Jessica K Miller  8 Danielle Pasternack  8 Lee Timms  8 Paul Krzyzanowski  8 Philip Awadalla  8 Rui Costa  13 Eran Segal  5 Scott V Bratman  1   8   14 Philip Beer  2 Sam Behjati  2   3 Inigo Martincorena  2 Jean C Y Wang  1   15   16 Kristian M Bowles  17   18 J Ramón Quirós  19 Anna Karakatsani  20   21 Carlo La Vecchia  20   22 Antonia Trichopoulou  20 Elena Salamanca-Fernández  23   24 José M Huerta  24   25 Aurelio Barricarte  24   26   27 Ruth C Travis  28 Rosario Tumino  29 Giovanna Masala  30 Heiner Boeing  31 Salvatore Panico  32 Rudolf Kaaks  33 Alwin Krämer  34 Sabina Sieri  35 Elio Riboli  36 Paolo Vineis  36 Matthieu Foll  12 James McKay  12 Silvia Polidoro  37 Núria Sala  38 Kay-Tee Khaw  39 Roel Vermeulen  40 Peter J Campbell  2   41 Elli Papaemmanuil  2   42 Mark D Minden  1   10   15   16 Amos Tanay  5 Ran D Balicer  7 Nicholas J Wareham  11 Moritz Gerstung  43   44 John E Dick  45   46 Paul Brennan  47 George S Vassiliou  48   49   50 Liran I Shlush  51   52   53
Free PMC article

Prediction of Acute Myeloid Leukaemia Risk in Healthy Individuals

Sagi Abelson et al. Nature. .
Free PMC article


The incidence of acute myeloid leukaemia (AML) increases with age and mortality exceeds 90% when diagnosed after age 65. Most cases arise without any detectable early symptoms and patients usually present with the acute complications of bone marrow failure1. The onset of such de novo AML cases is typically preceded by the accumulation of somatic mutations in preleukaemic haematopoietic stem and progenitor cells (HSPCs) that undergo clonal expansion2,3. However, recurrent AML mutations also accumulate in HSPCs during ageing of healthy individuals who do not develop AML, a phenomenon referred to as age-related clonal haematopoiesis (ARCH)4-8. Here we use deep sequencing to analyse genes that are recurrently mutated in AML to distinguish between individuals who have a high risk of developing AML and those with benign ARCH. We analysed peripheral blood cells from 95 individuals that were obtained on average 6.3 years before AML diagnosis (pre-AML group), together with 414 unselected age- and gender-matched individuals (control group). Pre-AML cases were distinct from controls and had more mutations per sample, higher variant allele frequencies, indicating greater clonal expansion, and showed enrichment of mutations in specific genes. Genetic parameters were used to derive a model that accurately predicted AML-free survival; this model was validated in an independent cohort of 29 pre-AML cases and 262 controls. Because AML is rare, we also developed an AML predictive model using a large electronic health record database that identified individuals at greater risk. Collectively our findings provide proof-of-concept that it is possible to discriminate ARCH from pre-AML many years before malignant transformation. This could in future enable earlier detection and monitoring, and may help to inform intervention.

Conflict of interest statement

Competing interests

The authors declare no competing financial interests.


Extended Data Figure 1
Extended Data Figure 1. Prevalence of ARCH-PD mutations with VAF ≥ 10% according to age.
Red and blue lines represent the proportion of pre-AMLs and controls, respectively, harbouring ARCH-PD mutations with VAF ≥10%.
Extended Data Figure 2
Extended Data Figure 2. Serial collected sampling supports a long-lived HSPC as the cell of origin for most ARCH-PD clones
a,b, VAF trajectory of persistent clones carrying putative driver mutations in pre-AML cases (right panel) and controls (left panel). Age is indicated on the x-axis. In the upper panel, VAF is shown on the y-axis and each persistent mutation is shown in a different colour, with circles denoting individual serial samples and solid lines representing the growth trajectory between serial samples. In the lower panel, dashed lines indicate the time interval between the last sampling and the end of follow-up (controls) or AML diagnosis (cases). c, Clonal growth rates (α) are shown for 27 control clones corresponding to 54 time points and 13 pre-AML clones corresponding to 15 time points. Box plots show median and whiskers represent the lower and upper quartiles.
Extended Data Figure 3
Extended Data Figure 3. Performance of combined model in predicting AML progression.
a, Receiver operating characteristic (ROC) curve for prediction of AML development using model 1 (see Methods). The red dot indicates the point on the curve with the highest positive predictive value (PPV) with sensitivity of 41.9% and specificity of 95.7%. b, Kaplan-Meier estimates of time to AML diagnosis for individuals predicted to develop AML (red) and not develop AML (blue) by model 1 (HR = 10.38, P 4.2e-10 ,Wald test) and c) model 2 (HR = 10.75, P = 1.75e-08, Wald test), from the point of enrolment until the end of follow-up to the EPIC study.
Extended Data Figure 4
Extended Data Figure 4. AML predictive models
a,b,c Time-dependent receiver operating characteristic curve for Cox proportional hazards model trained on the DC (a), VC (b) and combined cohorts (c). d,e,f Dynamic AUC for Cox proportional hazards models trained on the DC (d), VC (e) or combined cohort (f). g,h, Red and blue bars indicate the observed and expected VAF (g) and driver frequency (h) for pre-AML cases and controls for each gene indicated on the x-axis. DC, discovery cohort (n = 505 unique individuals); VC, validation cohort (n=291 individuals); ROC, receiver operating characteristic; AUC, area under curve.
Extended Data Figure 5
Extended Data Figure 5. AML-free survival according to mutation status and RDW.
a, Kaplan-Meier curves of AML-free survival, defined as the time between sample collection and AML diagnosis, death or last follow-up. Survival curves are stratified according to mutation status in genes mutated in at least 3 samples across the combined validation and discovery cohorts. N=796 unique individuals. b, Kaplan-Meier curve of AML-free survival stratified according to RDW value >14 or ≤14. Plot represents data for N=128 biologically independent individuals with RDW measurements recorded, including all pre-AMLs regardless of ARCH-PD status, and controls with ARCH-PD (controls without detectable mutations omitted). RDW, red cell distribution width.
Extended Data Figure 6
Extended Data Figure 6. Description of the cohort and the EHR derived measurements
a, Kaplan-Meier curves showing age stratified survival rates for 875 individuals who developed AML. b, Line plot representation of the number of cases per 100,000 control individuals in the EHR database. The centre values and error bars define the average and s.d respectively
Extended Data Figure 7
Extended Data Figure 7. Laboratory measurements contributing to EHR model
Box plot of normalized lab measurements (upper panels) and their association (lower panel) with higher AML risk. Box plots show median and whiskers represent the lower and upper quartiles
Extended Data Figure 8
Extended Data Figure 8. Top 50 EHR model parameters
Bar chart showing the relative contribution of the top 50 features incorporated into the EHR prediction model, ranked according to their predictive value (gain).
Extended Data Figure 9
Extended Data Figure 9. Distribution of EHR model parameters
Heat-map illustrating absolute values of clinical measurements. Blue, white and red represent low, intermediate and high values, respectively. Light grey represents missing data. FN and TP annotation is indicated on the lower bar as dark-grey and yellow color respectively. FN, false negative; TP, false positive; EHR, electronic health record.
Figure 1
Figure 1. Prevalence of ARCH, number of mutations and clone size in individuals who developed AML
a, Prevalence of ARCH-PD among pre-AML cases (red) and controls (blue). b, The number of ARCH-PD mutations detected in cases and controls according to age. Box plot centres, hinges and whiskers represent the median, first and third quartiles and 1.5 x interquartile range, respectively. c, VAF of ARCH-PD mutations. Significant differences are defined as P<0.0005 (two-sided Wilcoxon rank sum test with Bonferroni multiple testing correction) and are indicated by asterisks (*). All panels show data for n=800 biologically independent samples.
Figure 2
Figure 2. Acquisition of specific recurrent AML mutations by healthy individuals at young age is associated with progression to AML
a, Relative frequency of mutations in the indicated genes according to age group for pre-AMLs (red) and controls (blue). b, Proportion of pre-AML cases and controls harbouring ARCH-PD mutations in recurrently mutated genes. Asterisks (*) indicate P<0.05 (Fisher’s exact test with Bonferroni multiple testing correction). c, Plot showing the cumulative frequency of recurrent AML mutations (reported in >5 specimens in COSMIC) in pre-AML cases and controls. ARCH-PD mutations are ranked from left to right along the x-axis from low to high recurrence. d, VAF of recurrent mutations in cases and controls. Low, intermediate and highly recurrent COSMIC mutations are defined as those reported in 5-19 samples, 20-300 samples and >300 samples, respectively. Box plots indicate median, first and third quartiles and 1.5 x interquartile range. P-values were calculated by two-sided Wilcoxon rank sum test with Bonferroni multiple testing correction. All panels show data for n=800 unique individuals.
Figure 3
Figure 3. Model of future AML risk
a, Forest plot of the risk of AML. Purple, orange and green circles indicate hazard ratios and horizontal lines denote 95% confidence intervals for the combined cohort. For each gene, the indicated hazard ratio applies to the AML risk conferred by each 5% increase in mutation VAF over a 10 year period. The green vertical line indicates the mean HR across all genes. The HR for RUNX1 must to be interpreted with caution due to the relatively high prevalence of deleterious germline variants in this gene, which may not be readily distinguishable from somatic mutations in unmatched sequencing assays (see Methods). The proportion of individuals with mutations in each gene and the average VAF are indicated to the right of the forest plot; red and blue circles represent pre-AMLs and controls, respectively, with circle sizes scaled to reflect mutation frequency and VAF. b-d, Kaplan-Meier curves of AML-free survival, defined as the time between sample collection and AML diagnosis, death or last follow-up. Survival curves are stratified according to mutation status in selected genes (b), number of driver mutations per individual and largest clone detected (c) and red cell distribution width (RDW) (d). Panels a-c represent data for n=796 unique individuals and panel d includes n=299 individuals for whom RDW measurements were available.
Figure 4
Figure 4. Increased risk for AML development is inferred from electronic health records.
a, Box plot of normalised lab measurements. Increased RDW, reduction in monocyte, platelet, red blood cell and white blood cell counts presented high association (lower panel) with higher AML risk and differed at least a year before AML diagnosis. b, Model performance stratification by age and gender. c, Absolute lab values for true positive (TP) and false negatives (FN) predictions. WBC, white blood cell count; MONO.abs, absolute monocyte count; PLT, platelet count; NEUT.abs, absolute neutrophil count; RBC, red blood cell count; RDW, red cell distribution width. Box plots indicate median, first and third quartiles and 1.5 x interquartile range.

Comment in

  • Predicting progression to AML.
    Sellar RS, Jaiswal S, Ebert BL. Sellar RS, et al. Nat Med. 2018 Jul;24(7):904-906. doi: 10.1038/s41591-018-0114-7. Nat Med. 2018. PMID: 29988142 No abstract available.
  • Roots of AML Detectable Long before Symptoms.
    Cancer Discov. 2018 Sep;8(9):1056. doi: 10.1158/2159-8290.CD-NB2018-099. Epub 2018 Jul 23. Cancer Discov. 2018. PMID: 30037845
  • Early prediction of AML risk.
    Romero D. Romero D. Nat Rev Clin Oncol. 2018 Oct;15(10):590. doi: 10.1038/s41571-018-0078-z. Nat Rev Clin Oncol. 2018. PMID: 30050093 No abstract available.
  • How to predict the future.
    Dart A. Dart A. Nat Rev Genet. 2018 Sep;19(9):531. doi: 10.1038/s41576-018-0041-y. Nat Rev Genet. 2018. PMID: 30054567 No abstract available.
  • AML: Predicting the Unpredictable.
    Takahashi K. Takahashi K. Cell Stem Cell. 2018 Aug 2;23(2):162-163. doi: 10.1016/j.stem.2018.07.005. Cell Stem Cell. 2018. PMID: 30075128

Similar articles

See all similar articles

Cited by 61 articles

See all "Cited by" articles


    1. Deschler B, Lubbert M. Acute myeloid leukemia: epidemiology and etiology. Cancer. 2006;107:2099–2107. doi: 10.1002/cncr.22233. - DOI - PubMed
    1. Corces-Zimmerman MR, Hong WJ, Weissman IL, Medeiros BC, Majeti R. Preleukemic mutations in human acute myeloid leukemia affect epigenetic regulators and persist in remission. Proc Natl Acad Sci U S A. 2014;111:2548–2553. doi: 10.1073/pnas.1324297111. - DOI - PMC - PubMed
    1. Shlush LI, et al. Identification of pre-leukaemic haematopoietic stem cells in acute leukaemia. Nature. 2014;506:328–333. doi: 10.1038/nature13038. - DOI - PMC - PubMed
    1. Genovese G, et al. Clonal hematopoiesis and blood-cancer risk inferred from blood DNA sequence. N Engl J Med. 2014;371:2477–2487. doi: 10.1056/NEJMoa1409405. - DOI - PMC - PubMed
    1. Jaiswal S, et al. Age-related clonal hematopoiesis associated with adverse outcomes. N Engl J Med. 2014;371:2488–2498. doi: 10.1056/NEJMoa1408617. - DOI - PMC - PubMed

Publication types