Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
, 8, 29

High-resolution Characterization of Sequence Signatures Due to Non-Random Cleavage of Cell-Free DNA

Affiliations

High-resolution Characterization of Sequence Signatures Due to Non-Random Cleavage of Cell-Free DNA

Dineika Chandrananda et al. BMC Med Genomics.

Abstract

Background: High-throughput sequencing of cell-free DNA fragments found in human plasma has been used to non-invasively detect fetal aneuploidy, monitor organ transplants and investigate tumor DNA. However, many biological properties of this extracellular genetic material remain unknown. Research that further characterizes circulating DNA could substantially increase its diagnostic value by allowing the application of more sophisticated bioinformatics tools that lead to an improved signal to noise ratio in the sequencing data.

Methods: In this study, we investigate various features of cell-free DNA in plasma using deep-sequencing data from two pregnant women (>70X, >50X) and compare them with matched cellular DNA. We utilize a descriptive approach to examine how the biological cleavage of cell-free DNA affects different sequence signatures such as fragment lengths, sequence motifs at fragment ends and the distribution of cleavage sites along the genome.

Results: We show that the size distributions of these cell-free DNA molecules are dependent on their autosomal and mitochondrial origin as well as the genomic location within chromosomes. DNA mapping to particular microsatellites and alpha repeat elements display unique size signatures. We show how cell-free fragments occur in clusters along the genome, localizing to nucleosomal arrays and are preferentially cleaved at linker regions by correlating the mapping locations of these fragments with ENCODE annotation of chromatin organization. Our work further demonstrates that cell-free autosomal DNA cleavage is sequence dependent. The region spanning up to 10 positions on either side of the DNA cleavage site show a consistent pattern of preference for specific nucleotides. This sequence motif is present in cleavage sites localized to nucleosomal cores and linker regions but is absent in nucleosome-free mitochondrial DNA.

Conclusions: These background signals in cell-free DNA sequencing data stem from the non-random biological cleavage of these fragments. This sequence structure can be harnessed to improve bioinformatics algorithms, in particular for CNV and structural variant detection. Descriptive measures for cell-free DNA features developed here could also be used in biomarker analysis to monitor the changes that occur during different pathological conditions.

Figures

Fig. 1
Fig. 1
Empirical cumulative distribution functions of per-base read coverage for matched cell-free DNA and cellular samples. The two cfDNA datasets are named I1_M_plasma and G1_M_plasma while the cellular DNA from the matched subjects are named I1_M_cellular and G1_M_cellular
Fig. 2
Fig. 2
Size distributions of cell-free DNA contrasted with cellular DNA for two subjects (I1_M and G1_M). Fragments are divided into autosomal and mitochondrial classes and fragment sizes are calculated using the paired-positioning of sequencing reads
Fig. 3
Fig. 3
Estimated proportions from the 3-component Gaussian mixture model of the cell-free fragment lengths separated by chromosome. For both samples I1_M_plasma and G1_M_plasma, these estimates approximate the proportion of mono-, di- and tri-nucleosome lengths in each chromosome. All other mixture model parameters are reported in Additional file 4. The solid lines depict the average value in each component while the dashed lines demarcate +/- 3 standard deviations from the mean
Fig. 4
Fig. 4
Autosomal fragment lengths originating at regions annotated for alpha repeat elements and two micro-satellite types. The number of fragments used to calculate the size distribution is depicted in the legend beside each repeat category. The repeat specific profiles are superimposed over the genome-wide profile in sample I1_M_plasma for comparison purposes
Fig. 5
Fig. 5
Strand cross-correlation analysis for cell-free DNA. The 3′ strand is shifted with respect to the forward strand in increments of 1 bp and the Pearson’s correlation between the per-position read counts for each strand is calculated to generate this cross-correlation plot
Fig. 6
Fig. 6
Pearson’s correlation of cell-free and cellular DNA read coverage signal with open/closed chromatin enrichment annotation. Pairwise Pearson’s correlation is calculated between fragment start site signal tracks from cell-free and cellular DNA sequencing data along with open chromatin (FAIRE-seq) and nucleosomal position (MNase-seq) signal annotation from ENCODE. The figure provides the pictorial representation of the resulting correlation matrix
Fig. 7
Fig. 7
Mononucleotide frequencies for the region of 51 bp (+/−25 bp) around fragment start sites. The y-axis denotes the proportion of each nucleotide at fixed positions relative to the 5′ end of the DNA fragment and the vertical line at 0 denotes the fragment start. Sample I1_M is denoted with lines while circles represent the G1_M values. For both cellular and cell-free data in the two samples, fragments are divided into autosomal and mitochondrial classes displayed in dark and light colors for each base respectively
Fig. 8
Fig. 8
Size distributions of maternal cell-free DNA contrasted with fetal DNA for two subjects (I1_M and G1_M). Fragments are classed into the two components using allelic information at informative SNPs. Fragment sizes are calculated using the paired-positioning of sequencing reads
Fig. 9
Fig. 9
Comparison of the nucleotide signature at fragmentation sites for fetal and maternal fragments. This plot illustrates the mononucleotide frequencies for the region of 51 bp (+/−25 bp) around fragment starts and ends. The y-axis denotes the proportion of each nucleotide at fixed positions relative to the 5′ and 3′ ends of the DNA fragment and the vertical line at 0 denotes the strand specific fragment end. Maternal proportions per position are connected with lines while circles represent the fetal values. For both components, the proportions have been averaged over I1_M and G1_M. The close overlay of the fetal proportions and maternal values show that the variability between them is nearly negligible
Fig. 10
Fig. 10
Summary of the main sequence signatures and underlying biological signals documented by the study

Similar articles

See all similar articles

Cited by 24 PubMed Central articles

See all "Cited by" articles

References

    1. Mandel P, Metais P. Les acides nucléiques du plasma sanguin chez l’homme. CR Acad Sci Paris. 1948;142:241–3. - PubMed
    1. Jahr S, Hentze H, Englisch S, Hardt D, Fackelmayer FO, Hesch RD, et al. DNA fragments in the blood plasma of cancer patients: quantitations and evidence for their origin from apoptotic and necrotic cells. Cancer Res. 2001;61(4):1659–65. - PubMed
    1. Li Y, Zimmermann B, Rusterholz C, Kang A, Holzgreve W, Hahn S. Size separation of circulatory DNA in maternal plasma permits ready detection of fetal DNA polymorphisms. Clin Chem. 2004;50(6):1002–11. doi: 10.1373/clinchem.2003.029835. - DOI - PubMed
    1. Stroun M, Lyautey J, Lederrey C, Olson-Sand A, Anker P. About the possible origin and mechanism of circulating DNA: Apoptosis and active DNA release. Clin Chim Acta. 2001;313(1-2):139–42. doi: 10.1016/S0009-8981(01)00665-9. - DOI - PubMed
    1. van der Vaart M, Pretorius PJ. The origin of circulating free DNA. Clin Chem. 2007;53(12):2215. doi: 10.1373/clinchem.2007.092734. - DOI - PubMed

Publication types

LinkOut - more resources

Feedback