Machine learning integrates genomic signatures for subclassification beyond primary and secondary acute myeloid leukemia

Blood. 2021 Nov 11;138(19):1885-1895. doi: 10.1182/blood.2020010603.

Abstract

Although genomic alterations drive the pathogenesis of acute myeloid leukemia (AML), traditional classifications are largely based on morphology, and prototypic genetic founder lesions define only a small proportion of AML patients. The historical subdivision of primary/de novo AML and secondary AML has shown to variably correlate with genetic patterns. The combinatorial complexity and heterogeneity of AML genomic architecture may have thus far precluded genomic-based subclassification to identify distinct molecularly defined subtypes more reflective of shared pathogenesis. We integrated cytogenetic and gene sequencing data from a multicenter cohort of 6788 AML patients that were analyzed using standard and machine learning methods to generate a novel AML molecular subclassification with biologic correlates corresponding to underlying pathogenesis. Standard supervised analyses resulted in modest cross-validation accuracy when attempting to use molecular patterns to predict traditional pathomorphologic AML classifications. We performed unsupervised analysis by applying the Bayesian latent class method that identified 4 unique genomic clusters of distinct prognoses. Invariant genomic features driving each cluster were extracted and resulted in 97% cross-validation accuracy when used for genomic subclassification. Subclasses of AML defined by molecular signatures overlapped current pathomorphologic and clinically defined AML subtypes. We internally and externally validated our results and share an open-access molecular classification scheme for AML patients. Although the heterogeneity inherent in the genomic changes across nearly 7000 AML patients was too vast for traditional prediction methods, machine learning methods allowed for the definition of novel genomic AML subclasses, indicating that traditional pathomorphologic definitions may be less reflective of overlapping pathogenesis.

Publication types

  • Multicenter Study
  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Bayes Theorem
  • Cytogenetics
  • Gene Expression Regulation, Leukemic
  • Genomics
  • Humans
  • Leukemia, Myeloid, Acute / classification
  • Leukemia, Myeloid, Acute / diagnosis
  • Leukemia, Myeloid, Acute / genetics*
  • Machine Learning*
  • Mutation
  • Neoplasms, Second Primary / classification
  • Neoplasms, Second Primary / diagnosis
  • Neoplasms, Second Primary / genetics
  • Translocation, Genetic