Motivation: Inferring the taxonomic profile of a microbial community from a large collection of anonymous DNA sequencing reads is a challenging task in metagenomics. Because existing methods for taxonomic profiling of metagenomes are all based on the assignment of fragmentary sequences to phylogenetic categories, the accuracy of results largely depends on fragment length. This dependence complicates comparative analysis of data originating from different sequencing platforms or resulting from different preprocessing pipelines.
Results: We here introduce a new method for taxonomic profiling based on mixture modeling of the overall oligonucleotide distribution of a sample. Our results indicate that the mixture-based profiles compare well with taxonomic profiles obtained with other methods. However, in contrast to the existing methods, our approach shows a nearly constant profiling accuracy across all kinds of read lengths and it operates at an unrivaled speed.
Availability: A platform-independent implementation of the mixture modeling approach is available in terms of a MATLAB/Octave toolbox at http://gobics.de/peter/taxy. In addition, a prototypical implementation within an easy-to-use interactive tool for Windows can be downloaded.