Evaluating and optimizing the performance of software commonly used for the taxonomic classification of DNA metabarcoding sequence data

Mol Ecol Resour. 2017 Jul;17(4):760-769. doi: 10.1111/1755-0998.12628. Epub 2016 Nov 21.

Abstract

The taxonomic classification of DNA sequences has become a critical component of numerous ecological research applications; however, few studies have evaluated the strengths and weaknesses of commonly used sequence classification approaches. Further, the methods and software available for sequence classification are diverse, creating an environment in which it may be difficult to determine the best course of action and the trade-offs made using different classification approaches. Here, we provide an in silico evaluation of three DNA sequence classifiers, the rdp Naïve Bayesian Classifier, rtax and utax. Further, we discuss the results, merits and limitations of both the classifiers and our method of classifier evaluation. Our methods of comparison are simple, yet robust, and will provide researchers a methodological and conceptual foundation for making such evaluations in a variety of research situations. Generally, we found a considerable trade-off between accuracy and sensitivity for the classifiers tested, indicating a need for further improvement of sequence classification tools.

Keywords: rtax; utax; DNA barcoding; DNA metabarcoding; rdp Naïve Bayesian Classifier; taxonomic assignment.

MeSH terms

  • Bayes Theorem
  • Computer Simulation
  • DNA Barcoding, Taxonomic / methods*
  • Software*