Comparing algorithms that reconstruct cell lineage trees utilizing information on microsatellite mutations

PLoS Comput Biol. 2013;9(11):e1003297. doi: 10.1371/journal.pcbi.1003297. Epub 2013 Nov 14.

Abstract

Organism cells proliferate and die to build, maintain, renew and repair it. The cellular history of an organism up to any point in time can be captured by a cell lineage tree in which vertices represent all organism cells, past and present, and directed edges represent progeny relations among them. The root represents the fertilized egg, and the leaves represent extant and dead cells. Somatic mutations accumulated during cell division endow each organism cell with a genomic signature that is unique with a very high probability. Distances between such genomic signatures can be used to reconstruct an organism's cell lineage tree. Cell populations possess unique features that are absent or rare in organism populations (e.g., the presence of stem cells and a small number of generations since the zygote) and do not undergo sexual reproduction, hence the reconstruction of cell lineage trees calls for careful examination and adaptation of the standard tools of population genetics. Our lab developed a method for reconstructing cell lineage trees by examining only mutations in highly variable microsatellite loci (MS, also called short tandem repeats, STR). In this study we use experimental data on somatic mutations in MS of individual cells in human and mice in order to validate and quantify the utility of known lineage tree reconstruction algorithms in this context. We employed extensive measurements of somatic mutations in individual cells which were isolated from healthy and diseased tissues of mice and humans. The validation was done by analyzing the ability to infer known and clear biological scenarios. In general, we found that if the biological scenario is simple, almost all algorithms tested can infer it. Another somewhat surprising conclusion is that the best algorithm among those tested is Neighbor Joining where the distance measure used is normalized absolute distance. We include our full dataset in Tables S1, S2, S3, S4, S5 to enable further analysis of this data by others.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Animals
  • Bone Marrow Cells
  • Cell Lineage / genetics*
  • Cells, Cultured
  • Cluster Analysis
  • Computational Biology / methods
  • Computer Simulation
  • Female
  • Humans
  • Male
  • Mice
  • Mice, Transgenic
  • Microsatellite Repeats / genetics*
  • Models, Genetic
  • Mutation / genetics*
  • Phylogeny*

Grants and funding

This work was supported by The European Union FP7-ERC-AdG and by a research grant from Ms. Sally Appelbaum. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.