Accurate estimation of microbial sequence diversity with Distanced

Bioinformatics. 2020 Feb 1;36(3):728-734. doi: 10.1093/bioinformatics/btz668.

Abstract

Motivation: Microbes are the most diverse organisms on the planet. Deep sequencing of ribosomal DNA (rDNA) suggests thousands of different microbes may be present in a single sample. However, errors in sequencing have made any estimate of within-sample (alpha) diversity uncertain.

Results: We developed a tool to estimate alpha diversity of rDNA sequences from microbes (and other sequences). Our tool, Distanced, calculates how different (distant) sequences would be without sequencing errors. It does this using a Bayesian approach. Using this approach, Distanced accurately estimated alpha diversity of rDNA sequences from bacteria and fungi. It had lower root mean square prediction error (RMSPE) than when using no tool (leaving sequencing errors uncorrected). It was also accurate with non-microbial sequences (antibody mRNA). State-of-the-art tools (DADA2 and Deblur) were far less accurate. They often had higher RMSPE than when using no tool. Distanced thus represents an improvement over existing tools. Distanced will be useful to several disciplines, given microbial diversity affects everything from human health to ecosystem function.

Availability and implementation: Distanced is freely available at https://github.com/thackmann/Distanced.

Supplementary information: Supplementary data are available at Bioinformatics online.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Adenosine Deaminase
  • Bayes Theorem
  • Ecosystem*
  • Humans
  • Intercellular Signaling Peptides and Proteins
  • Sequence Analysis, DNA
  • Software*

Substances

  • Intercellular Signaling Peptides and Proteins
  • Adenosine Deaminase