Massive parallel sequencing technologies are promising a highly sensitive detection of low-level mutations, especially in mitochondrial DNA (mtDNA) studies. However, processes from DNA extraction and library construction to bioinformatic analysis include several varying tasks. Further, there is no validated recommendation for the comprehensive procedure. In this study, we examined potential pitfalls on the sequencing results based on two-person mtDNA mixtures. Therefore, we compared three DNA polymerases, six different variant callers in five mixtures between 50% and 0.5% variant allele frequencies generated with two different amplification protocols. In total, 48 samples were sequenced on Illumina MiSeq. Low-level variant calling at the 1% variant level and below was performed by comparing trimming and PCR duplicate removal as well as six different variant callers. The results indicate that sensitivity, specificity, and precision highly depend on the investigated polymerase but also vary based on the analysis tools. Our data highlight the advantage of prior standardization and validation of the individual laboratory setup with a DNA mixture model. Finally, we provide an artificial heteroplasmy benchmark dataset that can help improve somatic variant callers or pipelines, which may be of great interest for research related to cancer and aging.
Keywords: DNA polymerase; heteroplasmy; mitochondrial DNA (mtDNA); next generation sequencing (NGS); variant callers.