MetaNN: accurate classification of host phenotypes from metagenomic data using neural networks

BMC Bioinformatics. 2019 Jun 20;20(Suppl 12):314. doi: 10.1186/s12859-019-2833-2.

Abstract

Background: Microbiome profiles in the human body and environment niches have become publicly available due to recent advances in high-throughput sequencing technologies. Indeed, recent studies have already identified different microbiome profiles in healthy and sick individuals for a variety of diseases; this suggests that the microbiome profile can be used as a diagnostic tool in identifying the disease states of an individual. However, the high-dimensional nature of metagenomic data poses a significant challenge to existing machine learning models. Consequently, to enable personalized treatments, an efficient framework that can accurately and robustly differentiate between healthy and sick microbiome profiles is needed.

Results: In this paper, we propose MetaNN (i.e., classification of host phenotypes from Metagenomic data using Neural Networks), a neural network framework which utilizes a new data augmentation technique to mitigate the effects of data over-fitting.

Conclusions: We show that MetaNN outperforms existing state-of-the-art models in terms of classification accuracy for both synthetic and real metagenomic data. These results pave the way towards developing personalized treatments for microbiome related diseases.

Keywords: Host phenotypes; Machine learning; Metagenomics; Neural networks.

MeSH terms

  • Algorithms*
  • Area Under Curve
  • Databases, Genetic
  • Humans
  • Machine Learning
  • Metagenomics / methods*
  • Microbiota / genetics
  • Models, Theoretical
  • Neural Networks, Computer*
  • Phenotype
  • ROC Curve
  • Support Vector Machine