Denoising sparse microbial signals from single-cell sequencing of mammalian host tissues

Nat Comput Sci. 2023 Sep;3(9):741-747. doi: 10.1038/s43588-023-00507-1. Epub 2023 Sep 18.

Abstract

Existing genomic sequencing data can be used to study host-microbiome ecosystems, however distinguishing signals originating from truly present microbes versus contaminating species and artifacts is a substantial and often prohibitive challenge. Here we show that emerging sequencing technologies definitely capture reads from present microbes. We developed SAHMI, a computational resource to identify truly present microbial nucleic acids and filter contaminants and spurious false-positive taxonomic assignments from standard transcriptomic sequencing of mammalian tissues. In benchmark studies, SAHMI correctly identifies known microbial infections present in diverse tissues, and we validate SAHMI's enrichment for correctly classified, truly present species using multiple orthogonal computational experiments. The application of SAHMI to single-cell and spatial genomic data thus enables co-detection of somatic cells and microorganisms and joint analysis of host-microbiome ecosystems.