dDocent: a RADseq, variant-calling pipeline designed for population genomics of non-model organisms

PeerJ. 2014 Jun 10;2:e431. doi: 10.7717/peerj.431. eCollection 2014.

Abstract

Restriction-site associated DNA sequencing (RADseq) has become a powerful and useful approach for population genomics. Currently, no software exists that utilizes both paired-end reads from RADseq data to efficiently produce population-informative variant calls, especially for non-model organisms with large effective population sizes and high levels of genetic polymorphism. dDocent is an analysis pipeline with a user-friendly, command-line interface designed to process individually barcoded RADseq data (with double cut sites) into informative SNPs/Indels for population-level analyses. The pipeline, written in BASH, uses data reduction techniques and other stand-alone software packages to perform quality trimming and adapter removal, de novo assembly of RAD loci, read mapping, SNP and Indel calling, and baseline data filtering. Double-digest RAD data from population pairings of three different marine fishes were used to compare dDocent with Stacks, the first generally available, widely used pipeline for analysis of RADseq data. dDocent consistently identified more SNPs shared across greater numbers of individuals and with higher levels of coverage. This is due to the fact that dDocent quality trims instead of filtering, incorporates both forward and reverse reads (including reads with INDEL polymorphisms) in assembly, mapping, and SNP calling. The pipeline and a comprehensive user guide can be found at http://dDocent.wordpress.com.

Keywords: Bioinformatics; Molecular ecology; Next-generation sequencing; Population genomics; RADseq.

Grant support

This work was supported by Award #NA10NMF4270199 from the Saltonstall-Kennedy Program of the National Marine Fisheries Service (Department of Commerce/National Oceanic and Atmospheric Administration), Award #NA12NMF4330093 from the Marfin Program of the National Marine Fisheries Service, Award #NA12NMF4540082 from the Cooperative Research Program of the National Marine Fisheries Service, and Award # NA10OAR4170099 from the National Oceanic and Atmospheric Administration to Texas Sea Grant, and by TexasAgriLife under Project H-6703. The statements, findings, conclusions, and recommendations are those of the author(s) and do not necessarily reflect the views of the National Marine Fisheries Service, the National Oceanic and Atmospheric Administration (NOAA), the U.S. Department of Commerce, Texas Sea Grant, or Texas AgriLife. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.