Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Mar;49(3):332-340.
doi: 10.1038/ng.3756. Epub 2017 Jan 16.

Precision Oncology for Acute Myeloid Leukemia Using a Knowledge Bank Approach

Free PMC article

Precision Oncology for Acute Myeloid Leukemia Using a Knowledge Bank Approach

Moritz Gerstung et al. Nat Genet. .
Free PMC article


Underpinning the vision of precision medicine is the concept that causative mutations in a patient's cancer drive its biology and, by extension, its clinical features and treatment response. However, considerable between-patient heterogeneity in driver mutations complicates evidence-based personalization of cancer care. Here, by reanalyzing data from 1,540 patients with acute myeloid leukemia (AML), we explore how large knowledge banks of matched genomic-clinical data can support clinical decision-making. Inclusive, multistage statistical models accurately predicted likelihoods of remission, relapse and mortality, which were validated using data from independent patients in The Cancer Genome Atlas. Comparison of long-term survival probabilities under different treatments enables therapeutic decision support, which is available in exploratory form online. Personally tailored management decisions could reduce the number of hematopoietic cell transplants in patients with AML by 20-25% while maintaining overall survival rates. Power calculations show that databases require information from thousands of patients for accurate decision support. Knowledge banks facilitate personally tailored therapeutic decisions but require sustainable updating, inclusive cohorts and large sample sizes.


Figure 1
Figure 1. Systematic model comparison
a. Top panel: Concordance C of different model predictions for overall survival. For cross-validation analyses (grey), we generated 100 training and test sets by randomly splitting the full dataset. The distribution of concordance values across the 100 random sets is shown as a box-and-whisker plot. Also shown are point estimates with error bars for predictions evaluated on pre-specified splits of the dataset, where the training set represented 2 of the 3 trials in the study and the test set was the third trial (red, blue, green) or where the training set was the full AMLSG dataset with the test set being the TCGA cohort (purple). Predictions for the multistage model are evaluated 3yrs after diagnosis. Lower panel: Using the 100 random cross-validation splits, each of the 10 classes of predictive model was built on the training set and evaluated on the test set. The 10 models were ranked based on their relative performance on the test set and the ranks across the 100 cross-validation splits aggregated, indicating how often each model scored best (1st) to worst (10th). Time-dependent models include allogeneic hematopoietic stem cell transplants, which is treated as a time-dependent covariate to avoid bias. b. Coefficient of determination R2 for leave-one-out predictions using time-dependent random effects and multistage predictions of the AMLSG cohort, evaluated at each time (x-axis). c. Same as b, evaluated on TCGA data.
Figure 2
Figure 2. Multistage modeling of patient fate
a. Multistage model of patient trajectories. The six colored boxes indicate different stages during treatment, with five possible transitions indicated by solid arrows. Numbers in each box indicate the total number of patients that have entered a given stage in during follow-up. b. Sediment plot showing the fraction of patients in a given stage at a given time after diagnosis. The thick black line denotes overall survival, which is the sum of the deaths without complete remission (red), non-relapse mortality (blue) and mortality after relapse (green). c. Schematic overview of multistage regression. The model estimates the log-additive effect of each of 231 prognostic variables on the transition rates for all 5 possible time-dependent transitions shown in (a). Rate changes are modelled by Cox proportional hazards models with random effects. d. Concordance, C, indicates the survival times at 3 years after diagnosis were correctly ranked by the model. Similarly, at three years after diagnosis only 28% of patients were incorrectly predicted to be alive or dead. e. Mosaic plot of predicted 3-year survival across ELN categories. The height of each bar denotes the fraction of patients in each quarter of survival for each ELN group, and the width of each bar is proportional to the percentage of patients in each ELN group. f. Relative importance of risk factors for different transitions. The concordance C, is shown as percentages across the top of the bar chart.
Figure 3
Figure 3. Multistage outcome predictions for 1024 patients
Cross-validated risk predictions and observed statuses for 1024 patients, arranged along a Hilbert curve. This has the property that patients with similar AML subtype and risk constellation are grouped together in the 2-dimensional space (compare Supplementary Figure 1 for constellations of risk factors). For each individual patient, the survival curves predicted by the multistage model are shown, with the competing outcomes colored as in the legend and Figure 2b. What actually happened to the patient is shown as a line across the base of the graph, with a filled circle indicating the patient died, its color indicating the mode of death. Note that there are many patients for whom one color dominates the diagram, indicating that the probability that a particular event occurs is very high. Reassuringly, for such patients the observed outcomes are highly concordant with the cross-validated predictions and occur at frequencies matching the predicted probabilities.
Figure 4
Figure 4. Individualized risk exemplified for 2 patients
a. Sediment plot showing predicted multistage probability after remission for patient PD11104a under a management strategy of standard chemotherapy in CR1 with intended allograft after relapse. Predictions shown are based on models where the given patients were excluded for training; the bar at the bottom denotes the observed outcome (as for Figure 3). The patient was alive at the last follow-up 3.5 years after achieving first complete remission. Numbers at the bottom indicate the probabilities of non-relapse death (NRD), post-relapse death (PRD) and being alive after relapse (AAR) at years 1 to 5 from achieving complete remission. b. Multistage probability for PD11104a in the scenario of an allograft in first complete remission. c. Same as a for patient PD8314a. The patient relapsed after 1.2 years and died soon after. d. Same as b for patient PD8314a. Details of these calculations are presented in Supplementary Note, section; additional patients shown in Supplementary Figure S2.
Figure 5
Figure 5. Benefit of allograft in CR1 vs after relapse
a. Predicted three-year absolute mortality reduction by allografts in CR1 over standard chemotherapy in CR1 and allograft after relapse (y-axis). Calculations are based on patients <60yr in CR1 (n=995), who would be eligible for allogeneic transplants. The black curve represents the population average, with 95% confidence intervals in grey. Points denote individual patients in the cohort, colored by ELN risk category. b. Mosaic plot of absolute survival benefit at 3 years by an allograft in CR1 over standard chemotherapy after CR1 versus ELN risk category. The predicted benefit was discretized into four groups, indicated by colors, with intervals of 5%. c. Kaplan-Meier curves for patients with high (>10%, blue) and low (<10%, grey) predicted benefit of early allograft (cross-validated), each with and without allograft in CR1. Patients with favorable ELN risk were excluded. d. Predicted overall survival at 3yrs as a function of total number of allografts (in CR1 + after relapse). Patients are first ranked from those most likely to benefit from transplant to those least likely to benefit, as judged by current guidelines (solid blue line) or our current knowledge bank (solid red line). The curves show expected survival if allografts in CR1 increased from 0% to 100%, starting with the patient with the greatest and ending with the lowest predicted benefit. The x-axis starts at ~0.25, since about half of patients will relapse without an allograft in CR1, with a further half managing to undergo post-relapse transplantation.
Figure 6
Figure 6. Extrapolations and power calculations
a. Subsampling the number of patients reveals a steady, but saturating increase in prognostic concordance C for a random effects model for overall survival. Error bars show the 95% confidence intervals for the concordance obtained from multiple independent subsamples of the dataset. b. Graph relating the effect size (hazard ratio) of a prognostic variable to the absolute number of patients with the given factor required to reach significance in a random effects model for overall survival (solid line: P < 0.05; dotted P < 0.001). c. Average prediction error between simulated and estimated survival a random effects model for overall survival as a function of survival time (x-axis) and training cohort size (y-axis).

Comment in

Similar articles

See all similar articles

Cited by 50 articles

See all "Cited by" articles