Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Jun;39(6):727-736.
doi: 10.1038/s41587-020-00797-0. Epub 2021 Jan 18.

inStrain profiles population microdiversity from metagenomic data and sensitively detects shared microbial strains

Affiliations

inStrain profiles population microdiversity from metagenomic data and sensitively detects shared microbial strains

Matthew R Olm et al. Nat Biotechnol. 2021 Jun.

Abstract

Coexisting microbial cells of the same species often exhibit genetic variation that can affect phenotypes ranging from nutrient preference to pathogenicity. Here we present inStrain, a program that uses metagenomic paired reads to profile intra-population genetic diversity (microdiversity) across whole genomes and compares microbial populations in a microdiversity-aware manner, greatly increasing the accuracy of genomic comparisons when benchmarked against existing methods. We use inStrain to profile >1,000 fecal metagenomes from newborn premature infants and find that siblings share significantly more strains than unrelated infants, although identical twins share no more strains than fraternal siblings. Infants born by cesarean section harbor Klebsiella with significantly higher nucleotide diversity than infants delivered vaginally, potentially reflecting acquisition from hospital rather than maternal microbiomes. Genomic loci that show diversity in individual infants include variants found between other infants, possibly reflecting inoculation from diverse hospital-associated sources. inStrain can be applied to any metagenomic dataset for microdiversity analysis and rigorous strain comparison.

PubMed Disclaimer

Conflict of interest statement

Conflict of Interest

The authors declare no conflict of interest.

Figures

Figure 1.
Figure 1.. InStrain measures population-level diversity from metagenomic data.
a) Examples of metagenomic reads (grey boxes) mapping to genomic regions with low and high nucleotide diversity. Mismatches to the reference genome are represented by small colored marks on the reads, and the reference genome is represented below the reads. b-f) Examples of figures automatically generated by inStrain. b) SNV density, coverage, and nucleotide diversity across a bacteriophage genome. Spikes in nucleotide diversity and SNV density do not correspond with increased coverage, indicating that the signals are not due to read mis-mapping. Positions with nucleotide diversity and no SNV-density are those where diversity exists but is not high enough to call a SNV c) Metrics of SNV linkage vs. distance between SNVs; linkage decay (as shown here) is a common signal of recombination. d) Distribution of the major allele frequencies of bi-allelic SNVs (the Site Frequency Spectrum). Alleles with major frequencies below 50% are the result of multiallelic sites. The lack of distinct puncta suggest that more than a few distinct strains are present. e) Breadth of coverage (blue line), coverage depth (red line), and expected breadth of coverage given the depth of coverage (dotted blue line) versus the minimum ANI of mapped reads. Coverage depth continues to increase while breadth plateaus, suggesting that all regions of the reference genome are not present in the reads being mapped. f) Distribution of read pair ANI levels when mapped to a reference genome; this plot suggests that the reference genome is >1% different than the mapped reads.
Figure 2.
Figure 2.. InStrain accurately discriminates between closely related strains.
a) Table demonstrating the circumstances under which conANI and popANI substitutions will be called. ConANI substitutions are called whenever the consensus base differs, and popANI substitutions are only called when there is no allelic overlap between samples. b) Synthetic mutations were introduced to a reference genome of E. coli obtained from RefSeq to generate variant genomes with specific ANI differences from the reference genome, and four tools were used to compare the variant genomes to the reference genome. dRep, inStrain, and MIDAS consistently reported accurate ANI values, while StrainPhlAn was inaccurate by a median of 0.03% ANI. c) A mock community of bacterial cells was sequenced in biological triplicate and compared using four tools. InStrain performed best in correctly identifying that the genomes were identical in all three samples. d) The fecal microbiomes of three sets of twins were compared using each of the four tools, and the number of bacterial genomes with ANI values above a range of thresholds is plotted for pairs of twins (which are expected to share more strains) and pairs of unrelated infants. InStrain remained sensitive at higher ANI thresholds than the other three tools.
Figure 3.
Figure 3.. Siblings share significantly more microbial strains at birth than unrelated infants.
a,b) A link is drawn for each strain shared between pairs of infants (represented by rectangles along the circumferences). Links between sibling pairs are drawn in red, links between unrelated infants are drawn in grey. Diagrams are made displaying all strains (a) and only strains that are uniquely in two and only two infants (b). c) Enumeration of links drawn in (a) and (b). d) Twin pairs share significantly more strains of all domains than unrelated pairs (**** = p < 1e-15; two-sided Wilcoxon rank-sum test). e) Identical twin pairs do not share significantly more strains than fraternal twin pairs. f) Infants born more closely in gestational age share significantly more bacterial strains. g) Most strains colonize only a single infant, but some strains colonize many more. For each minimum number of infants colonized, a box is drawn for each strain that colonizes at least that many infants. Boxes are colored based on the species identity of each strain.
Figure 4.
Figure 4.. Analysis of the microdiversity of premature infant colonists.
a) Overall and among two of the six individual study cohorts, infants born via C-section had host microbes with higher nucleotide diversity than those delivered vaginally (* = p < 0.05; two-sided Wilcoxon rank-sum test). b) Organisms of the genus Klebsiella have significantly higher nucleotide diversity in infants born via C-section than those delivered vaginally (* = p < 0.05; two-sided Wilcoxon rank-sum test with Benjamini-Hochberg p-value correction for testing each microbial species and genus present in both vaginal and C-section born infants).
Figure 6.
Figure 6.. Tracking specific genetic differences within and between populations of an E. faecalis bacteriophage.
a) Frequencies of gene deletions, substitutions, and SNVs for all genes across an E. faecalis bacteriophage genome identified in 44 infants. Genes are colored based on their annotations. b) Frequency of observed substitutions (fixed differences between pairs of infants) in each gene versus frequency of SNVs (positions with multiple alleles in an individual infant at positions that are never observed as fixed differences). c) Ratios of non-synonymous to synonymous substitutions (dN/dS) and ratios of non-synonymous to synonymous population-level variants (pN/pS) for each gene. d) Classification of variant sites observed across infants only as substitutions, only as SNVs, and as both.

Similar articles

Cited by

References

    1. Zhao S et al. Adaptive Evolution within Gut Microbiomes of Healthy People. Cell Host Microbe 25, 656–667.e8 (2019). - PMC - PubMed
    1. Schloissnig S et al. Genomic variation landscape of the human gut microbiome. Nature 493, 45–50 (2012). - PMC - PubMed
    1. Simmons SL et al. Population genomic analysis of strain variation in Leptospirillum group II bacteria involved in acid mine drainage formation. PLoS Biol. 6, e177 (2008). - PMC - PubMed
    1. Eppley JM, Tyson GW, Getz WM & Banfield JF Genetic exchange across a species boundary in the archaeal genus ferroplasma. Genetics 177, 407–416 (2007). - PMC - PubMed
    1. Good BH, McDonald MJ, Barrick JE, Lenski RE & Desai MM The dynamics of molecular evolution over 60,000 generations. Nature (2017) doi:10.1038/nature24287. - DOI - PMC - PubMed

Publication types

LinkOut - more resources