Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014 Aug 22;15(1):284.
doi: 10.1186/1471-2105-15-284.

MT-Toolbox: improved amplicon sequencing using molecule tags

Affiliations

MT-Toolbox: improved amplicon sequencing using molecule tags

Scott M Yourstone et al. BMC Bioinformatics. .

Abstract

Background: Short oligonucleotides can be used as markers to tag and track DNA sequences. For example, barcoding techniques (i.e. Multiplex Identifiers or Indexing) use short oligonucleotides to distinguish between reads from different DNA samples pooled for high-throughput sequencing. A similar technique called molecule tagging uses the same principles but is applied to individual DNA template molecules. Each template molecule is tagged with a unique oligonucleotide prior to polymerase chain reaction. The resulting amplicon sequences can be traced back to their original templates by their oligonucleotide tag. Consensus building from sequences sharing the same tag enables inference of original template molecules thereby reducing effects of sequencing error and polymerase chain reaction bias. Several independent groups have developed similar protocols for molecule tagging; however, user-friendly software for build consensus sequences from molecule tagged reads is not readily available or is highly specific for a particular protocol.

Results: MT-Toolbox recognizes oligonucleotide tags in amplicons and infers the correct template sequence. On a set of molecule tagged test reads, MT-Toolbox generates sequences having on average 0.00047 errors per base. MT-Toolbox includes a graphical user interface, command line interface, and options for speed and accuracy maximization. It can be run in serial on a standard personal computer or in parallel on a Load Sharing Facility based cluster system. An optional plugin provides features for common 16S metagenome profiling analysis such as chimera filtering, building operational taxonomic units, contaminant removal, and taxonomy assignments.

Conclusions: MT-Toolbox provides an accessible, user-friendly environment for analysis of molecule tagged reads thereby reducing technical errors and polymerase chain reaction bias. These improvements reduce noise and allow for greater precision in single amplicon sequencing experiments.

PubMed Disclaimer

Figures

Figure 1
Figure 1
MT-Toolbox overview. Single-end or paired-end (overlapping or non-overlapping) reads can be input into MT-Toolbox. Overlapping paired-end reads are merged after which all reads, regardless of their type, are categorized by their MT. Next a square alignment matrix is created for each MT category using either an MSA algorithm (e.g. ClustalW, MUSCLE) or by read stacking. From these matrices, ConSeqs are built and quality control measures remove low-quality ConSeqs. When using the MT-MT-Toolbox plugin, ConSeqs are subjected to traditional 16S profiling analyses including OTU clustering, chimera filtering, contaminant filtering, and assigning taxonomy.
Figure 2
Figure 2
Read and MT counts per sample. Here reads are defined as the number of raw reads that can be categorized (“categorizable”). In other words, the read matches the expected regular expression pattern for merged reads (Additional file 1: Figure S1.C). MT-Toolbox assigns each categorizable read to an MT category. MT counts are the number of MT categories (i.e. number of originally tagged DNA templates). The sum total of reads in each MT category equals the number of categorizable reads.
Figure 3
Figure 3
MT depth histograms for each sample. The number of reads in each MT category influences the accuracy of the resulting ConSeqs. MTs with higher depth are likely to generate more accurate ConSeqs. Diluting samples helps generate more MTs with higher depth at the cost of reducing the amplicon diversity.
Figure 4
Figure 4
ConSeqs error profile. MT-Toolbox derived ConSeqs have fewer EPB than raw reads, and errors within ConSeqs are reduced as MT depth increases. MT-Toolbox ConSeqs generated from overlapping PE reads are the most accurate even at low MT depths. ConSeqs built from forward and reverse reads are slightly less accurate then overlapping PE ConSeqs. Raw reads of any type are the least accurate. MT-Toolbox ConSeqs generated from PE reads at depths ≤10 (88% of the data) are more accurate than those generated by LEA-Seq. EPB were calculated by averaging individual EPB of ConSeqs having the same depth. Error bars represent standard error and grow in length as depth increases due to fewer MTs having high depths (Figure 3).

References

    1. Kivioja T, Vähärautio A, Karlsson K, Bonke M, Enge M, Linnarsson S, Taipale J. Counting absolute numbers of molecules using unique molecular identifiers. Nat Methods. 2012;9:72–74. doi: 10.1038/nmeth.1778. - DOI - PubMed
    1. Fu GK, Hu J, Wang P-H, Fodor SPA. Counting individual DNA molecules by the stochastic attachment of diverse labels. Proc Natl Acad Sci U S A. 2011;108:9026–9031. doi: 10.1073/pnas.1017621108. - DOI - PMC - PubMed
    1. Casbon JA, Osborne RJ, Brenner S, Lichtenstein CP. A method for counting PCR template molecules with application to next-generation sequencing. Nucleic Acids Res. 2011;39:e81. doi: 10.1093/nar/gkr217. - DOI - PMC - PubMed
    1. Kinde I, Wu J, Papadopoulos N, Kinzler KW, Vogelstein B. Detection and quantification of rare mutations with massively parallel sequencing. Proc Natl Acad Sci U S A. 2011;108:9530–9535. doi: 10.1073/pnas.1105422108. - DOI - PMC - PubMed
    1. Jabara CB, Jones CD, Roach J, Anderson JA, Swanstrom R. Accurate sampling and deep sequencing of the HIV-1 protease gene using a Primer ID. Proc Natl Acad Sci U S A. 2011;108:20166–20171. doi: 10.1073/pnas.1110064108. - DOI - PMC - PubMed

Publication types

LinkOut - more resources