High-throughput DNA metabarcoding of amplicon sizes below 500 bp has revolutionized the analysis of environmental microbial diversity. However, these short regions contain limited phylogenetic signal, which makes it impractical to use environmental DNA in full phylogenetic inferences. This lesser phylogenetic resolution of short amplicons may be overcome by new long-read sequencing technologies. To test this idea, we amplified soil DNA and used PacBio Circular Consensus Sequencing (CCS) to obtain an ~4500-bp region spanning most of the eukaryotic small subunit (18S) and large subunit (28S) ribosomal DNA genes. We first treated the CCS reads with a novel curation workflow, generating 650 high-quality operational taxonomic units (OTUs) containing the physically linked 18S and 28S regions. To assign taxonomy to these OTUs, we developed a phylogeny-aware approach based on the 18S region that showed greater accuracy and sensitivity than similarity-based methods. The taxonomically annotated OTUs were then combined with available 18S and 28S reference sequences to infer a well-resolved phylogeny spanning all major groups of eukaryotes, allowing us to accurately derive the evolutionary origin of environmental diversity. A total of 1,019 sequences were included, of which a majority (58%) corresponded to the new long environmental OTUs. The long reads also allowed us to directly investigate the relationships among environmental sequences themselves, which represents a key advantage over the placement of short reads on a reference phylogeny. Together, our results show that long amplicons can be treated in a full phylogenetic framework to provide greater taxonomic resolution and a robust evolutionary perspective to environmental DNA.
Keywords: PacBio; metabarcoding; phylogeny; protists; rDNA operon; taxonomy.
© 2019 John Wiley & Sons Ltd.