Motivation: Tumor sequencing has entered an exciting phase with the advent of single-cell techniques that are revolutionizing the assessment of single nucleotide variation (SNV) at the highest cellular resolution. However, state-of-the-art single-cell sequencing technologies produce data with many missing bases (MBs) and incorrect base designations that lead to false-positive (FP) and false-negative (FN) detection of somatic mutations. While computational methods are available to make biological inferences in the presence of these errors, the accuracy of the imputed MBs and corrected FPs and FNs remains unknown.
Results: Using computer simulated datasets, we assessed the robustness performance of four existing methods (OncoNEM, SCG, SCITE and SiFit) and one new method (BEAM). BEAM is a Bayesian evolution-aware method that improves the quality of single-cell sequences by using the intrinsic evolutionary information in the single-cell data in a molecular phylogenetic framework. Overall, BEAM and SCITE performed the best. Most of the methods imputed MBs with high accuracy, but effective detection and correction of FPs and FNs is a challenge, especially for small datasets. Analysis of an empirical dataset shows that computational methods can improve both the quality of tumor single-cell sequences and their utility for biological inference. In conclusion, tumor cells descend from pre-existing cells, which creates evolutionary continuity in single-cell sequencing datasets. This information enables BEAM and other methods to correctly impute missing data and incorrect base assignments, but correction of FPs and FNs remains challenging when the number of SNVs sampled is small relative to the number of cells sequenced.
Availability and implementation: BEAM is available on the web at https://github.com/SayakaMiura/BEAM.