EukRef: Phylogenetic curation of ribosomal RNA to enhance understanding of eukaryotic diversity and distribution

PLoS Biol. 2018 Sep 17;16(9):e2005849. doi: 10.1371/journal.pbio.2005849. eCollection 2018 Sep.

Abstract

Environmental sequencing has greatly expanded our knowledge of micro-eukaryotic diversity and ecology by revealing previously unknown lineages and their distribution. However, the value of these data is critically dependent on the quality of the reference databases used to assign an identity to environmental sequences. Existing databases contain errors and struggle to keep pace with rapidly changing eukaryotic taxonomy, the influx of novel diversity, and computational challenges related to assembling the high-quality alignments and trees needed for accurate characterization of lineage diversity. EukRef (eukref.org) is an ongoing community-driven initiative that addresses these challenges by bringing together taxonomists with expertise spanning the eukaryotic tree of life and microbial ecologists, who use environmental sequence data to develop reliable reference databases across the diversity of microbial eukaryotes. EukRef organizes and facilitates rigorous mining and annotation of sequence data by providing protocols, guidelines, and tools. The EukRef pipeline and tools allow users interested in a particular group of microbial eukaryotes to retrieve all sequences belonging to that group from International Nucleotide Sequence Database Collaboration (INSDC) (GenBank, the European Nucleotide Archive [ENA], or the DNA DataBank of Japan [DDBJ]), to place those sequences in a phylogenetic tree, and to curate taxonomic and environmental information for the group. We provide guidelines to facilitate the process and to standardize taxonomic annotations. The final outputs of this process are (1) a reference tree and alignment, (2) a reference sequence database, including taxonomic and environmental information, and (3) a list of putative chimeras and other artifactual sequences. These products will be useful for the broad community as they become publicly available (at eukref.org) and are shared with existing reference databases.

Publication types

  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Ciliophora / genetics
  • Data Curation*
  • Databases, Genetic
  • Eukaryota / classification*
  • Eukaryota / genetics*
  • Genetic Variation*
  • Phylogeny*
  • RNA, Ribosomal / genetics*

Substances

  • RNA, Ribosomal

Grants and funding

Gordon and Betty Moore Foundation moore.org. Received by JdC and LWP. National Science Foundation www.nsf.gov (grant number 1545931). Received by MB, JdC, and LWP. International Society of Protistologists protistologists.org. Received by JdC and LWP. Tula Foundation tula.org. Received by JdC, VB, MK, and PJK. Marie Skłodowska-Curie Actions European Comission https://ec.europa.eu/research/mariecurieactions/ (grant number FP7-PEOPLE-2012-IOF -331450 CAARL). Received by JdC. Czech Academy of Sciences, Czech Republic Fellowship Purkyne. Received by MK. European Regional Development Fund http://ec.europa.eu/regional_policy/en/funding/erdf/ (grant number CZ.02.1.01/0.0/0.0/16_019/0000759CePaViP). Received by MK. National Science Foundation www.nsf.gov (grant number OCE1435515). Received by LS. Investissements d’Avenir (grant number ANR-11-BTBR-0008OCEANOMICS). Received by CB and CdV. NSERC-DG http://www.nserc-crsng.gc.ca/. Received by LWP. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.