Evaluation of the authenticity of a highly novel environmental sequence from boreal forest soil using ribosomal RNA secondary structure modeling

Mol Phylogenet Evol. 2013 Apr;67(1):234-45. doi: 10.1016/j.ympev.2013.01.018. Epub 2013 Feb 9.

Abstract

The number of sequences from both formally described taxa and uncultured environmental DNA deposited in the International Nucleotide Sequence Databases has increased substantially over the last two decades. Although the majority of these sequences represent authentic gene copies, there is evidence of DNA artifacts in these databases as well. These include lab artifacts, such as PCR chimeras, and biological artifacts such as pseudogenes or other paralogous sequences. Sequences that fall in basal positions in phylogenetic trees and appear distant from known sequences are particularly suspect. Phylogenetic analyses suggest that a novel sequence type (NS1) found in two boreal forest soil clone libraries belongs to the fungal kingdom but does not fall unambiguously within any known phylum. We have evaluated this sequence type using an array of secondary-structure analyses. To our knowledge, such analyses have never been used on environmental ribosomal sequences. Ribosomal secondary structure was modeled for four rRNA loci (ITS1, 5.8S, ITS2, 5' LSU). These models were analyzed for the presence of conserved domains, conserved nucleotide motifs, and compensatory base changes. Minimal free energy (MFE) foldings and GC contents of sequences representing the major fungal clades, as well as NS1, were also compared. NS1 displays secondary rRNA structures consistent with other fungi and many, but not all, conserved nucleotide motifs found across eukaryotes. However, our analyses show that many other authentic sequences from basal fungi lack more of these conserved motifs than does NS1. Together our findings suggest that NS1 represents an authentic gene copy. The methods described here can be used on any rRNA-coding sequence, not just environmental fungal sequences. As new-generation sequencing methods that yield shorter sequences become more widely implemented, methods that evaluate sequence authenticity should also be more widely implemented. For fungi, the adjacent 5.8S and ITS2 loci should be prioritized. This region is not only suited to distinguishing between closely related species, but it is also more informative in terms of expected secondary structure.

Publication types

  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Alaska
  • Base Composition
  • Bayes Theorem
  • DNA, Fungal / analysis
  • DNA, Ribosomal Spacer / genetics
  • Databases, Genetic
  • Likelihood Functions
  • Models, Genetic
  • Models, Molecular
  • Nucleic Acid Conformation*
  • Phylogeny
  • RNA, Ribosomal / analysis*
  • RNA, Ribosomal, 5.8S / analysis
  • Sequence Analysis, DNA
  • Soil / analysis*
  • Trees*

Substances

  • DNA, Fungal
  • DNA, Ribosomal Spacer
  • RNA, Ribosomal
  • RNA, Ribosomal, 5.8S
  • Soil