Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Feb 13;16(1):36.
doi: 10.1186/s13059-015-0592-6.

BitPhylogeny: a probabilistic framework for reconstructing intra-tumor phylogenies

BitPhylogeny: a probabilistic framework for reconstructing intra-tumor phylogenies

Ke Yuan et al. Genome Biol. .

Abstract

Cancer has long been understood as a somatic evolutionary process, but many details of tumor progression remain elusive. Here, we present BitPhylogenyBitPhylogeny, a probabilistic framework to reconstruct intra-tumor evolutionary pathways. Using a full Bayesian approach, we jointly estimate the number and composition of clones in the sample as well as the most likely tree connecting them. We validate our approach in the controlled setting of a simulation study and compare it against several competing methods. In two case studies, we demonstrate how BitPhylogeny BitPhylogeny reconstructs tumor phylogenies from methylation patterns in colon cancer and from single-cell exomes in myeloproliferative neoplasm.

PubMed Disclaimer

Figures

Figure 1
Figure 1
The intra-tumor phylogeny problem. (A) Molecular profiles obtained from a bulk sequenced heterogeneous tumor are shown. They consist, in this example, of three clones (red squares, blue triangles, and green discs) and normal cells (small grey discs). The intra-tumor phylogeny problem is to infer the population structure of the tumor, i.e., to identify the different clones and to elucidate how they relate to each other. (B) Classical phylogenetic trees and hierarchical clustering methods place the observed molecular profiles at the leaf nodes of a tree, while the inner nodes represent unobserved common ancestors. Here, leaf nodes are defined as the nodes without any child nodes and inner nodes as the nodes that have at least one child node. (C) Unlike classical phylogenetic tree models, BitPhylogeny clusters molecular profiles to identify subclones and places them as both inner (blue triangle) and leaf nodes (red square, green disc) of a tree describing the hierarchy of the tumor cell population.
Figure 2
Figure 2
BitPhylogeny as a graphical model. Each of a total of N observed marker patterns is denoted by xn (shaded node). The clone membership of each observation is denoted by ε n and generated by a tree-structured stick-breaking process with variables ν ε (clone size) and φ ε (branching probability), and parameters λ, α 0 and γ. For each clone, t ε and θ ε are the branch length and clone parameter, respectively, which determine the local probability distribution of observing a marker pattern from this clone. The function pa(·) denotes the parent of each clone in the tree, except the root clone . The transition probabilities p(θ εθ pa(ε)) have hyperparameters β m, β u, Λ and μ.
Figure 3
Figure 3
Simulation study with five trees (A-E). (First column) Sankey plots of the trees used for simulations. For each node, the width of the in-edge is proportional to the clone frequency. The colors denote different layers of the tree (tree depths). Plots were produced with the R package riverplot. (Second column) Performance of clustering methods for the simulation studies with four different noise levels. Performance measures are based on 10,000 MCMC samples (the box plots in the second column). The MPEAR-summarized predictions (marked as BitPhylogeny) outperform the baseline competitors in all data sets with noise. (Third column) Comparison in terms of the summary statistics maximum tree depth and number of clones. For hierarchical clustering and k-centroids, the trees are constructed as minimum spanning trees from estimated clonal methyltypes.
Figure 4
Figure 4
Consensus node-based shortest path distances for all simulated trees. Each box plot is summarized for the distance measures across four noise levels (0%, 1%, 2% and 5%). The suffixes L, M and H for the polyclonal tree type refer to the polyclonal-low, -medium and -high trees in Figure 3. BitPhylogeny consistently outperforms the two baseline methods.
Figure 5
Figure 5
Analysis of CT samples. (A) Level-wise mass distribution of CT samples. For each tree, the bars show the level sums of the mixture model masses for all eight samples of the CT tumor. The red bars correspond to the posterior means of the root masses. The blue, green and pink bars correspond to the means of the sums of the second, third and fourth tree levels, respectively. (B) Maximum depth of trees of the individual samples. Turquoise densities are from the right side of the tumor and the pink ones are from the left side. Trees from the left side of the tumor have peaked posterior densities at a depth of either 3 or 4, while the posterior densities from the right side are less peaked. (C) Total branch length of trees of individual samples. The trees from the left side, which peak at depth 3 in (B), have shorter total branch lengths than the tree that peaks at depth 4 or the trees from the right side of the tumor.
Figure 6
Figure 6
Analysis of CX samples and joint analysis of all samples. Turquoise densities are for samples from the right side of the tumor and the pink ones are for the left side. (A) Maximum depth of trees. Trees from the right side have posterior maximum depth between 2 and 3, while trees from the left side have posterior maximum depth between 3 and 4. (B) Total branch length of trees. Trees from the right side have slightly shorter total branch lengths than the trees from the left side. (C) Number of clones in a tree. Trees from the right side contain fewer clones than trees from the left side. (D) Mean number of clones versus mean maximum depth of trees. With these two summary statistics of trees, samples from the left and right can be separated very clearly.
Figure 7
Figure 7
Reconstructed tree and mutation profiles from single-cell exome sequencing data. (A) Reconstructed phylogeny. Non-empty clones are labeled a through i followed by the number of cells they contain. The vertical distance represents the evolutionary distance between clones. (B) Estimated probabilities of six SNVs in key genes across all cells. The error bars summarize 50,000 MCMC samples and are color-coded according to clone membership.

Similar articles

Cited by

References

    1. Nowell PC. The clonal evolution of tumor cell populations. Science. 1976;194:23–8. doi: 10.1126/science.959840. - DOI - PubMed
    1. Beerenwinkel N, Schwarz RF, Gerstung M, Markowetz F. Cancer evolution: mathematical models and computational inference. Syst Biol. 2015;64:e1–25. doi: 10.1093/sysbio/syu081. - DOI - PMC - PubMed
    1. Shah SP, Morin RD, Khattra J, Prentice L, Pugh T, Burleigh A, et al. Mutational evolution in a lobular breast tumour profiled at single nucleotide resolution. Nature. 2009;461:809–13. doi: 10.1038/nature08489. - DOI - PubMed
    1. Gerlinger M, Rowan AJ, Horswell S, Larkin J, Endesfelder D, Gronroos E et al. Intratumor heterogeneity and branched evolution revealed by multiregion sequencing. N Engl J Med. 2012;366:883–92. doi: 10.1056/NEJMoa1113205. - DOI - PMC - PubMed
    1. Nik-Zainal S, Alexandrov LB, Wedge DC, Van Loo P, Greenman CD, Raine K, et al. Mutational processes molding the genomes of 21 breast cancers. Cell. 2012;149:979–93. doi: 10.1016/j.cell.2012.04.024. - DOI - PMC - PubMed

Publication types