Clonal reconstruction from time course genomic sequencing data

BMC Genomics. 2019 Dec 30;20(Suppl 12):1002. doi: 10.1186/s12864-019-6328-3.

Abstract

Background: Bacterial cells during many replication cycles accumulate spontaneous mutations, which result in the birth of novel clones. As a result of this clonal expansion, an evolving bacterial population has different clonal composition over time, as revealed in the long-term evolution experiments (LTEEs). Accurately inferring the haplotypes of novel clones as well as the clonal frequencies and the clonal evolutionary history in a bacterial population is useful for the characterization of the evolutionary pressure on multiple correlated mutations instead of that on individual mutations.

Results: In this paper, we study the computational problem of reconstructing the haplotypes of bacterial clones from the variant allele frequencies observed from an evolving bacterial population at multiple time points. We formalize the problem using a maximum likelihood function, which is defined under the assumption that mutations occur spontaneously, and thus the likelihood of a mutation occurring in a specific clone is proportional to the frequency of the clone in the population when the mutation occurs. We develop a series of heuristic algorithms to address the maximum likelihood inference, and show through simulation experiments that the algorithms are fast and achieve near optimal accuracy that is practically plausible under the maximum likelihood framework. We also validate our method using experimental data obtained from a recent study on long-term evolution of Escherichia coli.

Conclusion: We developed efficient algorithms to reconstruct the clonal evolution history from time course genomic sequencing data. Our algorithm can also incorporate clonal sequencing data to improve the reconstruction results when they are available. Based on the evaluation on both simulated and experimental sequencing data, our algorithms can achieve satisfactory results on the genome sequencing data from long-term evolution experiments.

Availability: The program (ClonalTREE) is available as open-source software on GitHub at https://github.com/COL-IU/ClonalTREE.

Keywords: Clonal reconstruction; Long-term evolution experiment; Maximum likelihood; Time course.

MeSH terms

  • Algorithms*
  • Bacteria / genetics*
  • Base Sequence
  • Clonal Evolution / genetics*
  • Gene Frequency
  • Genome, Bacterial / genetics
  • Genomics / methods*
  • Haplotypes
  • High-Throughput Nucleotide Sequencing
  • Likelihood Functions
  • Mutation
  • Software