Rapid comparison and correlation analysis among massive number of microbial community samples based on MDV data model

Sci Rep. 2014 Sep 17;4:6393. doi: 10.1038/srep06393.

Abstract

The research in microbial communities would potentially impact a vast number of applications in "bio"-related disciplines. Large-scale analyses became a clear trend in microbial community studies, thus it is increasingly important to perform efficient and in-depth data mining for insightful biological principles from large number of samples. However, as microbial communities are from different sources and of different structures, comparison and data-mining from large number of samples become quite difficult. In this work, we have proposed a data model to represent large-scale comparison of microbial community samples, namely the "Multi-Dimensional View" data model (the MDV model) that should at least include 3 aspects: samples profile (S), taxa profile (T) and meta-data profile (V). We have also proposed a method for rapid data analysis based on the MDV model and applied it on the case studies with samples from various environmental conditions. Results have shown that though sampling environments usually define key variables, the analysis could detect bio-makers and even subtle variables based on large number of samples, which might be used to discover novel principles that drive the development of communities. The efficiency and effectiveness of data analysis method based on the MDV model have been validated by the results.

Publication types

  • Comparative Study
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Bacteria / classification*
  • Bacteria / genetics*
  • Computational Biology / methods*
  • Data Mining / methods*
  • Female
  • Humans
  • Male
  • Models, Theoretical*
  • Phylogeny
  • Programming Languages
  • Software
  • Soil Microbiology
  • Water Microbiology