Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015;16 Suppl 1(Suppl 1):S2.
doi: 10.1186/1471-2105-16-S1-S2. Epub 2015 Jan 21.

PEAT: an intelligent and efficient paired-end sequencing adapter trimming algorithm

PEAT: an intelligent and efficient paired-end sequencing adapter trimming algorithm

Yun-Lung Li et al. BMC Bioinformatics. 2015.

Abstract

Background: In modern paired-end sequencing protocols short DNA fragments lead to adapter-appended reads. Current paired-end adapter removal approaches trim adapter by scanning the fragment of adapter on the 3' end of the reads, which are not competent in some applications.

Results: Here, we propose a fast and highly accurate adapter-trimming algorithm, PEAT, designed specifically for paired-end sequencing. PEAT requires no a priori adaptor sequence, which is convenient for large-scale meta-analyses. We assessed the performance of PEAT with many adapter trimmers in both simulated and real life paired-end sequencing libraries. The importance of adapter trimming was exemplified by the influence of the downstream analyses on RNA-seq, ChIP-seq and MNase-seq. Several useful guidelines of applying adapter trimmers with aligners were suggested.

Conclusions: PEAT can be easily included in the routine paired-end sequencing pipeline. The executable binaries and the standalone C++ source code package of PEAT are freely available online.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Illustrations of paired-end sequencing. (A) illustrates two strands of a double strands of DNA are both sequenced in the direction from 5' to 3' and ligated with paired-end adapters. In the situation that the double strands of DNA have a length longer than or equal to the machine specific sequencing length, no adapter corresponding sequences will be appended in the obtained paired-end sequenced reads. (B) illustrates the "read through" situation that the double strands of DNA have a length smaller than the machine specific sequencing length, so that parts of the adapter sequences are sequenced as well. (C) illustrates the error-prone strategy the existing adapter trimmers used for handling adapter-trimming operation.
Figure 2
Figure 2
Illustration of the algorithm that PEAT applies to handle paired-end adapter trimming operation. The algorithm first conducts reverse-complemented string matching between the front parts of the paired-end sequenced reads of length L, which is pre-specified as a minimum possible DNA fragment length of the paired-end sequenced reads. The algorithm next verifies the trimming positions by identifying whether the accordingly determined front parts, i.e. the parts corresponding to the DNA fragments, are mutually reverse-complemented, and the rear parts, i.e. the parts corresponding to the adapter sequences, are substantially the same (optional). See Methods for details.
Figure 3
Figure 3
Middle-error-rate (MED) datasets. (A) shows the ratio of the read count of the untrimmed reads over that of the trimmed reads by all tested trimmers applied to the simulated MED/dMED datasets. (B, C) illustrates the length distributions of the trimmed reads processed by each of the tested adapter trimmers applied to simulated MED (B) and dMED (C) datasets. The distributions are depicted with the ratio of the amount of reads trimmed at certain length, ranging from 1 to 100 bp, over the total amount of trimmed reads. The ratios are magnified by 5 times with the range from 1 to 50 to visualize the presence of short fragments after trimming.
Figure 4
Figure 4
The length distributions of the unique mapping reads from the selected sequencing libraries. (A) ChIP-seq, (B) RNA-seq, and (C) MNase-seq datasets.
Figure 5
Figure 5
The V-plots of the selected MNase-seq datasets. The anchors were CTCF binding sites. The plot in the left panel was processed with PEAT and Bowtie2 end-to-end alignment option; the plot in the right panel was processed with Bowtie2 local alignment option.

Similar articles

Cited by

References

    1. Fullwood MJaR Y. ChIP-based methods for the identification of long-range chromatin interactions. J Cell Biochem. 2009;107(1):30–39. doi: 10.1002/jcb.22116. - DOI - PMC - PubMed
    1. Le Hir H. The spliceosome deposits multiple proteins 20-24 nucleotides upstream of mRNA exon-exon junctions. The EMBO Journal. 2000;19(24):6860–6869. doi: 10.1093/emboj/19.24.6860. - DOI - PMC - PubMed
    1. Paired-End Sequencing | Achieve maximum coverage across the genome. http://illumina.com
    1. FASTX-Toolkit. http://hannonlab.cshl.edu
    1. Martin M. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnetjournal. 2011;17(1):10–22.

Publication types

LinkOut - more resources