Neonatal sepsis (NS) is responsible for over 1 million yearly deaths worldwide. In the developing world, NS is often treated without an identified microbial pathogen. Amplicon sequencing of the bacterial 16S rRNA gene can be used to identify organisms that are difficult to detect by routine microbiological methods. However, contaminating bacteria are ubiquitous in both hospital settings and research reagents and must be accounted for to make effective use of these data. In this study, we sequenced the bacterial 16S rRNA gene obtained from blood and cerebrospinal fluid (CSF) of 80 neonates presenting with NS to the Mbarara Regional Hospital in Uganda. Assuming that patterns of background contamination would be independent of pathogenic microorganism DNA, we applied a novel quantitative approach using principal orthogonal decomposition to separate background contamination from potential pathogens in sequencing data. We designed our quantitative approach contrasting blood, CSF, and control specimens and employed a variety of statistical random matrix bootstrap hypotheses to estimate statistical significance. These analyses demonstrate that Leptospira appears present in some infants presenting within 48 h of birth, indicative of infection in utero, and up to 28 days of age, suggesting environmental exposure. This organism cannot be cultured in routine bacteriological settings and is enzootic in the cattle that often live in close proximity to the rural peoples of western Uganda. Our findings demonstrate that statistical approaches to remove background organisms common in 16S sequence data can reveal putative pathogens in small volume biological samples from newborns. This computational analysis thus reveals an important medical finding that has the potential to alter therapy and prevention efforts in a critically ill population.
Keywords: 16S rRNA; Leptospira; bacteria; neonatal sepsis; principal orthogonal decomposition; singular value decomposition.