HetFHMM: A Novel Approach to Infer Tumor Heterogeneity Using Factorial Hidden Markov Models

J Comput Biol. 2018 Feb;25(2):182-193. doi: 10.1089/cmb.2017.0101. Epub 2017 Oct 16.

Abstract

Cancer arises from successive rounds of mutations, resulting in tumor cells with different somatic mutations known as clones. Drug responsiveness and therapeutics of cancer depend on the accurate detection of clones in a tumor sample. Recent research has considered inferring clonal composition of a tumor sample using computational models based on short read data of the sample generated using next-generation sequencing (NGS) technology. Short reads (segmented DNA parts of different tumor cells) are noisy; therefore, inferring the clones and their mutations from the data is a difficult and complex problem. We develop a new model called HetFHMM, based on factorial hidden Markov models, to infer clones and their proportions from noisy NGS data. In our model, each hidden chain represents the genomic signature of a clone, and a mixture of chains results in the observed data. We make use of Gibbs sampling and exponentiated gradient algorithms to infer the hidden variables and mixing proportions. We compare our model with strong models from previous work (PyClone and PhyloSub) based on both synthetic data and real cancer data on acute myeloid leukemia. Empirical results confirm that HetFHMM infers clonal composition of a tumor sample more accurately than previous work.

Keywords: AML; clone; heterogeneity; tumor.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Clonal Evolution
  • Computational Biology / methods*
  • Computational Biology / standards
  • Genetic Heterogeneity*
  • Humans
  • Leukemia, Myeloid, Acute / genetics*
  • Markov Chains
  • Mutation Accumulation
  • Sequence Analysis, DNA / methods*
  • Sequence Analysis, DNA / standards