On statistical modeling of sequencing noise in high depth data to assess tumor evolution

J Stat Phys. 2018 Jul;172(1):143-155. doi: 10.1007/s10955-017-1945-1. Epub 2017 Dec 21.


One cause of cancer mortality is tumor evolution to therapy-resistant disease. First line therapy often targets the dominant clone, and drug resistance can emerge from preexisting clones that gain fitness through therapy-induced natural selection. Such mutations may be identified using targeted sequencing assays by analysis of noise in high-depth data. Here, we develop a comprehensive, unbiased model for sequencing error background. We find that noise in sufficiently deep DNA sequencing data can be approximated by aggregating negative binomial distributions. Mutations with frequencies above noise may have prognostic value. We evaluate our model with simulated exponentially expanded populations as well as data from cell line and patient sample dilution experiments, demonstrating its utility in prognosticating tumor progression. Our results may have the potential to identify significant mutations that can cause recurrence. These results are relevant in the pre-treatment clinical setting to determine appropriate therapy and prepare for potential recurrence pretreatment.

Keywords: 02.50.-r; 87.18.Tt; 87.23.Kg.