Influence of artifact removal on rare species recovery in natural complex communities using high-throughput sequencing

PLoS One. 2014 May 6;9(5):e96928. doi: 10.1371/journal.pone.0096928. eCollection 2014.


Large-scale high-throughput sequencing techniques are rapidly becoming popular methods to profile complex communities and have generated deep insights into community biodiversity. However, several technical problems, especially sequencing artifacts such as nucleotide calling errors, could artificially inflate biodiversity estimates. Sequence filtering for artifact removal is a conventional method for deleting error-prone sequences from high-throughput sequencing data. As rare species represented by low-abundance sequences in datasets may be sensitive to artifact removal process, the influence of artifact removal on rare species recovery has not been well evaluated in natural complex communities. Here we employed both internal (reliable operational taxonomic units selected from communities themselves) and external (indicator species spiked into communities) references to evaluate the influence of artifact removal on rare species recovery using 454 pyrosequencing of complex plankton communities collected from both freshwater and marine habitats. Multiple analyses revealed three clear patterns: 1) rare species were eliminated during sequence filtering process at all tested filtering stringencies, 2) more rare taxa were eliminated as filtering stringencies increased, and 3) elimination of rare species intensified as biomass of a species in a community was reduced. Our results suggest that cautions be applied when processing high-throughput sequencing data, especially for rare taxa detection for conservation of species at risk and for rapid response programs targeting non-indigenous species. Establishment of both internal and external references proposed here provides a practical strategy to evaluate artifact removal process.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Artifacts
  • DNA / chemistry*
  • High-Throughput Nucleotide Sequencing* / standards
  • Plankton / genetics*
  • Reference Standards
  • Sequence Analysis, DNA / standards


  • DNA

Grant support

This work was supported by the One-Three-Five Program (YSW2013B02) of the Research Center for Eco-Environmental Sciences and 100 Talents Program of the Chinese Academy of Sciences to A.Z., by Discovery grants from Natural Sciences and Engineering Research Council of Canada (NSERC), the NSERC Canadian Aquatic Invasive Species Network (CAISN), and an NSERC Discovery Accelerator Supplement to H.J.M. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.