IMSA: integrated metagenomic sequence analysis for identification of exogenous reads in a host genomic background

PLoS One. 2013 May 23;8(5):e64546. doi: 10.1371/journal.pone.0064546. Print 2013.

Abstract

Metagenomics, the study of microbial genomes within diverse environments, is a rapidly developing field. The identification of microbial sequences within a host organism enables the study of human intestinal, respiratory, and skin microbiota, and has allowed the identification of novel viruses in diseases such as Merkel cell carcinoma. There are few publicly available tools for metagenomic high throughput sequence analysis. We present Integrated Metagenomic Sequence Analysis (IMSA), a flexible, fast, and robust computational analysis pipeline that is available for public use. IMSA takes input sequence from high throughput datasets and uses a user-defined host database to filter out host sequence. IMSA then aligns the filtered reads to a user-defined universal database to characterize exogenous reads within the host background. IMSA assigns a score to each node of the taxonomy based on read frequency, and can output this as a taxonomy report suitable for cluster analysis or as a taxonomy map (TaxMap). IMSA also outputs the specific sequence reads assigned to a taxon of interest for downstream analysis. We demonstrate the use of IMSA to detect pathogens and normal flora within sequence data from a primary human cervical cancer carrying HPV16, a primary human cutaneous squamous cell carcinoma carrying HPV 16, the CaSki cell line carrying HPV16, and the HeLa cell line carrying HPV18.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Carcinoma, Squamous Cell / virology
  • Female
  • Genes, Bacterial*
  • Genes, Viral*
  • Genome, Human
  • HeLa Cells
  • High-Throughput Nucleotide Sequencing
  • Human papillomavirus 16 / genetics
  • Human papillomavirus 18 / genetics
  • Humans
  • Metagenome*
  • Microbiota / genetics*
  • Molecular Sequence Annotation
  • Papillomavirus Infections / virology
  • Sequence Alignment
  • Sequence Analysis, DNA / methods*
  • Uterine Cervical Neoplasms / virology