Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
, 13 (7), e0197333

Automated Size Selection for Short Cell-Free DNA Fragments Enriches for Circulating Tumor DNA and Improves Error Correction During Next Generation Sequencing


Automated Size Selection for Short Cell-Free DNA Fragments Enriches for Circulating Tumor DNA and Improves Error Correction During Next Generation Sequencing

Sabine Hellwig et al. PLoS One.


Circulating tumor-derived cell-free DNA (ctDNA) enables non-invasive diagnosis, monitoring, and treatment susceptibility testing in human cancers. However, accurate detection of variant alleles, particularly during untargeted searches, remains a principal obstacle to widespread application of cell-free DNA in clinical oncology. In this study, isolation of short cell-free DNA fragments is shown to enrich for tumor variants and improve correction of PCR- and sequencing-associated errors. Subfractions of the mononucleosome of circulating cell-free DNA (ccfDNA) were isolated from patients with melanoma, pancreatic ductal adenocarcinoma, and colorectal adenocarcinoma using a high-throughput-capable automated gel-extraction platform. Using a 128-gene (128 kb) custom next-generation sequencing panel, variant alleles were on average 2-fold enriched in the short fraction (median insert size: ~142 bp) compared to the original ccfDNA sample, while 0.7-fold reduced in the fraction corresponding to the principal peak of the mononucleosome (median insert size: ~167 bp). Size-selected short fractions compared to the original ccfDNA yielded significantly larger family sizes (i.e., PCR duplicates) during in silico consensus sequence interpretation via unique molecular identifiers. Increments in family size were associated with a progressive reduction of PCR and sequencing errors. Although consensus read depth also decreased at larger family sizes, the variant allele frequency in the short ccfDNA fraction remained consistent, while variant detection in the original ccfDNA was commonly lost at family sizes necessary to minimize errors. These collective findings support the automated extraction of short ccfDNA fragments to enrich for ctDNA while concomitantly reducing false positives through in silico error correction.

Conflict of interest statement

The authors have declared that no competing interests exist.


Fig 1
Fig 1. Detection of variant alleles in ccfDNA.
Flowchart (A) depicting steps prior to determination of variant allele frequency (VAF). With known variants, VAF can be determined directly from ccfDNA with ddPCR (a), while sequencing requires a multi-step process (b). The addition of truncated adapters followed by extension to full-length in separate steps (b-e) is done to improve resolution during size selection (c, d) of desired subfractions of ccfDNA. There was a strong association (B) between direct measurement of VAF in ccfDNA by ddPCR (A-a) and by the multi-step sequencing process (A-b). This association was present even at VAFs < 1.5% (B-inset). The equations for each colored regression line are shown in a corresponding color. In (C), boxplots of wild type alleles (dark blue) and variant alleles (light blue) by NGS are shown for each cancer patient (C = colorectal adenocarcinoma; M = melanoma; P = pancreatic ductal adenocarcinoma). In (C), data are only shown for insert sizes ≤250 to focus results on the mononucleosome as that length approximates the midpoint between the mononucleosome and dinucleosome lengths associated with ccfDNA. The light gray line identifies the median insert size (167 bp) from all patients. In the majority of patients, the median insert size of the tumor-associated variant allele was shorter than the corresponding wild type allele.
Fig 2
Fig 2. Effect of size selection on VAF in spiked ccfDNA libraries using an automated gel-extraction platform.
Distribution by densitometry of the short (purple) and long (orange) fractions isolated from healthy control unselected ccfDNA samples (black; N = 7) is shown in (A). Size includes full-length adapters (~135 bp). Note the evidence of a tail in the short fraction (A, blue arrow) consistent with longer fragments migrating with a shorter target fragment size. Although the overall distributions overlapped, the peak fragment length of the short fraction was significantly less than the long fraction (B). No significant difference was measured between the peak fragment lengths of the long fraction and the unselected mononucleosome (B). Gray numbers indicate the mean±SD peak fragment length for each sample (B). VAF determined by ddPCR for the EGFR T790M synthetic spike (130 bp) for the short (purple) and long (orange) fractions and the unselected ccfDNA (black) are graphed in (C). In the short fraction the T790M allele remained detectable even when it was undetectable in unselected ccfDNA (C, inset), while it was virtually absent from the long fraction regardless of dilution. The enrichment factor in the ‘small’ fraction was associated with extent of dilution with the greatest amount of enrichment occurring in the most diluted samples (D; data only shown when unselected ccfDNA VAF was above the limit of blank by ddPCR). VAF by ddPCR for the BRAF V600E synthetic spike (165 bp) is shown in (E). Overall, there was a general trend towards enrichment in both the short (E, purple) and long (E, orange) fractions. The variant was present throughout the short samples except at the lowest dilutions (E, inset). Extent of enrichment was relatively consistent regardless of dilution (F). In A-D, error bars indicate standard deviation from independent duplicate experiments. *** P < 0.001; NS = not significant; AFU = arbitrary fluorescent unit.
Fig 3
Fig 3. Enrichment of variant alleles in short ccfDNA fractions.
In (A), representative distributions by densitometry are shown of the isolated fractions (short—purple; medium—green; long—orange) from the original ccfDNA (black) of a single cancer patient. The fragment lengths include full-length adapters (~135 bp). The cumulative distribution of insert sizes at variant locations from all patients for each subfraction (B) show a profile consistent with densitometry (A). The peak fragment lengths from each patient by densitometry (C) and the median insert size by sequencing (D) were statistically significantly different between each respective subfraction, while observations for the long fraction were similar to the unselected mononucleosome (black). Enrichment for variant alleles was greatest in the short fraction by both ddPCR (E) and sequencing (F) with intermediate enrichment in the medium fraction (F). In the long fraction analyzed by both modalities there was a tendency for reduction in VAF (E and F). Solid bars represent the mean value. In (C-E), mean±SD values are shown in gray. * P < 0.05; ** P ≤ 0.01; *** P ≤ 0.001; NS = not significant; AFU = arbitrary fluorescent unit.
Fig 4
Fig 4. Generation of large family sizes in short ccfDNA.
Total reads were similar between sheared buffy coat DNA, unselected ccfDNA, and short ccfDNA (A). Consensus read depth (family size ≥1) was greatest in buffy coat DNA, followed by unselected ccfDNA, and then short ccfDNA (B). Average family size was greatest in the short ccfDNA (C). At the specific variant locations for each patient, consensus read depth in buffy coat DNA rapidly decayed, reaching zero by family size ≥20 (D, gray). In contrast, both the unselected ccfDNA (D, black) and the short ccfDNA (D, purple) showed fewer consensus reads at family size ≥1, but maintained a greater read depth at larger family sizes (D, inset). Consensus read depth at family size ≥20 was greatest in short ccfDNA (E). In (A-C) and (E), solid bars represent the mean value. In (A-E), whiskers correspond to the standard deviation. *** P ≤ 0.001; NS = not significant.
Fig 5
Fig 5. Reduction of false positives at larger family sizes.
Corresponding variants present in patient ccfDNA were queried in matched buffy coat DNA (A). False positives were few and incrementally decreased with larger family sizes (A). In (B), the cumulative number of false positives from all healthy control ccfDNA and all five targeted patient variants is shown. Overall, only two false positives were identified. In (C), the mean error rate across the entire capture panel (128 genes, 128 kb) decreased with increasingly larger family sizes. Total consensus aligned counts for non-reference alleles with AF < 0.1% (D), 0.1% ≥ AF ≤ 1.0% (E), and 1.0% > AF ≤ 2.0% (F) are shown (black circles). In (E) and (F), non-reference alleles are sub-categorized as “unique” (blue squares) or “shared” (gray triangles). In (F), “shared” non-reference alleles are not shown as they are similar to the total count. In (F), the “unique” non-reference allele count is plotted on a second y-axis. In (C-F), whiskers correspond to the standard deviation.
Fig 6
Fig 6. Effects of family size on VAF.
Overall, VAF was relatively stable up to a family size ≥10 in unselected ccfDNA (A). However, at larger family sizes VAF became less stable and included complete loss of variants in some samples (B, magnification of area in blue box shown in A). Of note, complete loss of the variant allele occurred in one sample with an initial VAF > 5% (A and B, black arrow). In contrast, VAF remained relatively stable up to family size ≥20 in the short ccfDNA fraction (C, magnification of area in blue box shown in D). Note the apparent increase of VAF in the short ccfDNA fraction at lower allele frequencies (D) compared to the unselected ccfDNA (B). The relative percent difference in VAF was similar in unselected and short ccfDNA at family size (FS) ≥5 and FS ≥10 (E). However, the relative percent difference was statistically significantly lower in the short ccfDNA fraction at FS ≥15 and FS ≥20 (E). * P < 0.05; *** P ≤ 0.001; NS = not significant.

Similar articles

See all similar articles

Cited by 4 PubMed Central articles


    1. Donaldson J, Park BH. Circulating Tumor DNA: Measurement and Clinical Utility. Annu Rev Med. 2017. Epub 2017/08/29. 10.1146/annurev-med-041316-085721 . - DOI - PubMed
    1. Lui YY, Chik KW, Chiu RW, Ho CY, Lam CW, Lo YM. Predominant hematopoietic origin of cell-free DNA in plasma and serum after sex-mismatched bone marrow transplantation. Clin Chem. 2002;48(3):421–7. Epub 2002/02/28. . - PubMed
    1. Murtaza M, Dawson SJ, Pogrebniak K, Rueda OM, Provenzano E, Grant J, et al. Multifocal clonal evolution characterized using circulating tumour DNA in a case of metastatic breast cancer. Nat Commun. 2015;6:8760 Epub 2015/11/05. 10.1038/ncomms9760 . - DOI - PMC - PubMed
    1. Lanman RB, Mortimer SA, Zill OA, Sebisanovic D, Lopez R, Blau S, et al. Analytical and Clinical Validation of a Digital Sequencing Panel for Quantitative, Highly Accurate Evaluation of Cell-Free Circulating Tumor DNA. PLoS One. 2015;10(10):e0140712 Epub 2015/10/17. 10.1371/journal.pone.0140712 . - DOI - PMC - PubMed
    1. Bettegowda C, Sausen M, Leary RJ, Kinde I, Wang Y, Agrawal N, et al. Detection of circulating tumor DNA in early- and late-stage human malignancies. Sci Transl Med. 2014;6(224):224ra24 Epub 2014/02/21. 10.1126/scitranslmed.3007094 . - DOI - PMC - PubMed

Publication types