Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Jul;172(1):143-155.
doi: 10.1007/s10955-017-1945-1. Epub 2017 Dec 21.

On Statistical Modeling of Sequencing Noise in High Depth Data to Assess Tumor Evolution

Affiliations
Free PMC article

On Statistical Modeling of Sequencing Noise in High Depth Data to Assess Tumor Evolution

Raul Rabadan et al. J Stat Phys. .
Free PMC article

Abstract

One cause of cancer mortality is tumor evolution to therapy-resistant disease. First line therapy often targets the dominant clone, and drug resistance can emerge from preexisting clones that gain fitness through therapy-induced natural selection. Such mutations may be identified using targeted sequencing assays by analysis of noise in high-depth data. Here, we develop a comprehensive, unbiased model for sequencing error background. We find that noise in sufficiently deep DNA sequencing data can be approximated by aggregating negative binomial distributions. Mutations with frequencies above noise may have prognostic value. We evaluate our model with simulated exponentially expanded populations as well as data from cell line and patient sample dilution experiments, demonstrating its utility in prognosticating tumor progression. Our results may have the potential to identify significant mutations that can cause recurrence. These results are relevant in the pre-treatment clinical setting to determine appropriate therapy and prepare for potential recurrence pretreatment.

Keywords: 02.50.-r; 87.18.Tt; 87.23.Kg.

Figures

Fig. 1
Fig. 1
Number of variants with error depth of v from aggregated simulated cycles of PCR amplification at four error rates: 12 cycles (left), 14 cycles (middle), and 18 cycles (right). Ptheo. and NBtheo. are calculated using equation (3), and Pemp. and NBemp. are calculates using equation (4). The χ2 test was used to compare the distributions.
Fig. 2
Fig. 2
Error depth distribution in ultra-deep sequencing of a TP53 locus at 10,000× for all variants (left), transitions (middle), and transversions (right).
Fig. 3
Fig. 3
Error depth distribution in ultra-deep sequencing of a TP53 locus at 100,000× for all variants (left), transitions (middle), and transversions (right).
Fig. 4
Fig. 4
Error depth distribution in ultra-deep sequencing of a TP53 locus at 1,000,000× for all variants (left), transitions (middle), and transversions (right).
Fig. 5
Fig. 5
Error depth distribution in ultra-deep sequencing of a SF3B1 locus at mean 620,000× for all variants (left), transitions (middle), and transversions (right).
Fig. 6
Fig. 6
Sensitivity of detecting TP53-Y234C mutation dilutions. Assessing the presence of a variant requires correcting for multiple hypotheses based on the number of sequenced genomic positions (Bonferroni correction). Testing the presence of a discovered variant does not require such a correction; here, significance is set at 0.01.

Similar articles

See all similar articles

Cited by 2 articles

LinkOut - more resources

Feedback