IDTAXA: a novel approach for accurate taxonomic classification of microbiome sequences
- PMID: 30092815
- PMCID: PMC6085705
- DOI: 10.1186/s40168-018-0521-5
IDTAXA: a novel approach for accurate taxonomic classification of microbiome sequences
Abstract
Background: Microbiome studies often involve sequencing a marker gene to identify the microorganisms in samples of interest. Sequence classification is a critical component of this process, whereby sequences are assigned to a reference taxonomy containing known sequence representatives of many microbial groups. Previous studies have shown that existing classification programs often assign sequences to reference groups even if they belong to novel taxonomic groups that are absent from the reference taxonomy. This high rate of "over classification" is particularly detrimental in microbiome studies because reference taxonomies are far from comprehensive.
Results: Here, we introduce IDTAXA, a novel approach to taxonomic classification that employs principles from machine learning to reduce over classification errors. Using multiple reference taxonomies, we demonstrate that IDTAXA has higher accuracy than popular classifiers such as BLAST, MAPSeq, QIIME, SINTAX, SPINGO, and the RDP Classifier. Similarly, IDTAXA yields far fewer over classifications on Illumina mock microbial community data when the expected taxa are absent from the training set. Furthermore, IDTAXA offers many practical advantages over other classifiers, such as maintaining low error rates across varying input sequence lengths and withholding classifications from input sequences composed of random nucleotides or repeats.
Conclusions: IDTAXA's classifications may lead to different conclusions in microbiome studies because of the substantially reduced number of taxa that are incorrectly identified through over classification. Although misclassification error is relatively minor, we believe that many remaining misclassifications are likely caused by errors in the reference taxonomy. We describe how IDTAXA is able to identify many putative mislabeling errors in reference taxonomies, enabling training sets to be automatically corrected by eliminating spurious sequences. IDTAXA is part of the DECIPHER package for the R programming language, available through the Bioconductor repository or accessible online ( http://DECIPHER.codes ).
Keywords: 16S rRNA gene sequencing; Classification; ITS sequencing; Microbiome; Reference taxonomy; Taxonomic assignment.
Conflict of interest statement
Not applicable.
Not applicable.
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Figures
Similar articles
-
To compare the performance of prokaryotic taxonomy classifiers using curated 16S full-length rRNA sequences.Comput Biol Med. 2022 Jun;145:105416. doi: 10.1016/j.compbiomed.2022.105416. Epub 2022 Mar 17. Comput Biol Med. 2022. PMID: 35313206
-
Accurate annotation of protein coding sequences with IDTAXA.NAR Genom Bioinform. 2021 Sep 16;3(3):lqab080. doi: 10.1093/nargab/lqab080. eCollection 2021 Sep. NAR Genom Bioinform. 2021. PMID: 34541527 Free PMC article.
-
Optimizing taxonomic classification of marker-gene amplicon sequences with QIIME 2's q2-feature-classifier plugin.Microbiome. 2018 May 17;6(1):90. doi: 10.1186/s40168-018-0470-z. Microbiome. 2018. PMID: 29773078 Free PMC article.
-
Current challenges and best-practice protocols for microbiome analysis.Brief Bioinform. 2021 Jan 18;22(1):178-193. doi: 10.1093/bib/bbz155. Brief Bioinform. 2021. PMID: 31848574 Free PMC article. Review.
-
Reference databases for taxonomic assignment in metagenomics.Brief Bioinform. 2012 Nov;13(6):682-95. doi: 10.1093/bib/bbs036. Epub 2012 Jul 10. Brief Bioinform. 2012. PMID: 22786784 Review.
Cited by
-
Diversity and composition of the bacterial communities associated with the Australian spittlebugs Bathyllus albicinctus and Philagra parva (Hemiptera: Aphrophoridae).PLoS One. 2024 Oct 10;19(10):e0311938. doi: 10.1371/journal.pone.0311938. eCollection 2024. PLoS One. 2024. PMID: 39388461 Free PMC article.
-
A Bioinformatics Guide to Plant Microbiome Analysis.Front Plant Sci. 2019 Oct 23;10:1313. doi: 10.3389/fpls.2019.01313. eCollection 2019. Front Plant Sci. 2019. PMID: 31708944 Free PMC article. Review.
-
Microbial diversity of garden snail mucus.Microbiologyopen. 2022 Feb;11(1):e1263. doi: 10.1002/mbo3.1263. Microbiologyopen. 2022. PMID: 35212476 Free PMC article.
-
Antibiotic exposure postweaning disrupts the neurochemistry and function of enteric neurons mediating colonic motor activity.Am J Physiol Gastrointest Liver Physiol. 2020 Jun 1;318(6):G1042-G1053. doi: 10.1152/ajpgi.00088.2020. Epub 2020 May 11. Am J Physiol Gastrointest Liver Physiol. 2020. PMID: 32390463 Free PMC article.
-
MT-MAG: Accurate and interpretable machine learning for complete or partial taxonomic assignments of metagenomeassembled genomes.PLoS One. 2023 Aug 18;18(8):e0283536. doi: 10.1371/journal.pone.0283536. eCollection 2023. PLoS One. 2023. PMID: 37594964 Free PMC article.
References
-
- Parks DH, Rinke C, Chuvochina M, Chaumeil P-A, Woodcroft BJ, Evans PN, et al. Recovery of nearly 8,000 metagenome-assembled genomes substantially expands the tree of life. Nat Microbiol. 2017;2:1533–42. - PubMed
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Other Literature Sources
Research Materials
