Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Jul 29;1(1):72-87.
doi: 10.1016/j.cels.2015.01.001. Epub 2015 Mar 3.

Geospatial Resolution of Human and Bacterial Diversity With City-Scale Metagenomics

Free PMC article

Geospatial Resolution of Human and Bacterial Diversity With City-Scale Metagenomics

Ebrahim Afshinnekoo et al. Cell Syst. .
Free PMC article


The panoply of microorganisms and other species present in our environment influence human health and disease, especially in cities, but have not been profiled with metagenomics at a city-wide scale. We sequenced DNA from surfaces across the entire New York City (NYC) subway system, the Gowanus Canal, and public parks. Nearly half of the DNA (48%) does not match any known organism; identified organisms spanned 1,688 bacterial, viral, archaeal, and eukaryotic taxa, which were enriched for harmless genera associated with skin (e.g., Acinetobacter). Predicted ancestry of human DNA left on subway surfaces can recapitulate U.S. Census demographic data, and bacterial signatures can reveal a station's history, such as marine-associated bacteria in a hurricane-flooded station. Some evidence of pathogens was found (Bacillus anthracis), but a lack of reported cases in NYC suggests that the pathogens represent a normal, urban microbiome. This baseline metagenomic map of NYC could help long-term disease surveillance, bioterrorism threat mitigation, and health management in the built environment of cities.


Figure 1
Figure 1. The Metagenome of New York City
(A) The five boroughs of NYC include (1) Manhattan (green), (2) Brooklyn (yellow), (3) Queens (orange), (4) Bronx (red), (5) Staten Island (lavender). (B) The collection from the 466 subway stations of NYC across the 24 subway lines involved three main steps: (1) collection with Copan Elution swabs, (2) data entry into the database, and (3) uploading of the data. An image is shown of the current collection database, taken from (C) Workflow for sample DNA extraction, library preparation, sequencing, quality trimming of the FASTQ files, and alignment with MegaBLAST and MetaPhlAn to discern taxa present. (D) Distribution of taxa identified from the entire pooled dataset. (E) Geospatial analysis of the most prevalent genus, Pseudomonas, across the subway system; hotspots reveal high density of Pseudomonas in areas in Manhattan and Brooklyn.
Figure 2
Figure 2. Human Ancestry Predictions from Subway Metagenomic Data Mirror Census Data
Using ancestry-informative alleles from the 1000 Genomes Project and the ancestry prediction tool Ancestry Mapper, we were able to recapitulate the likely demographics of stations, based on the DNA left on the surfaces (A–G). We calculated the RMSD (gray bars) of the calculated ancestry versus the 2010 census data for each station (left). The colors for each ancestry are shown on top, and the stacked barplots show the proportion of 100% of alleles. We have used K=4 for admixture. In our datasets, the four ancestral components correspond to African/European/Asian/Ameridian. The Ameridian component has been matched to the Hispanic census designation; this is an approximation, as hispanics generally also have strong European components. For plots (B)–(G), horizontal black lines represent the percentage match (y axis) of alleles of each known an cestry (x axis); the top four ranking ancestries are highlighted using text labels colored to match census legends in (C), (E), and (G). In Canarsie, Brooklyn (B and C), an increase in African alleles was predicted, which matched the census data (green), and the same trend was observed for a primarily Hispanic area in the Bronx (Mount Eden). In one area of Manhattan near Penn Station, we found a higher incidence of European alleles concomitant with an increase in Asian alleles. Areas of the city (e.g., Chinatown) are annotated directly in the maps.
Figure 3
Figure 3. Coverage Plots of Virulence Elements from Staphylococcus aureus and Yesinia pestis
We used the Integrative Genomics Viewer to plot the mapped number of reads from the shotgun sequence data that mapped to known virulence elements, including (A) the mecA gene from MRSA and (B) the pMT1 plasmid from Y. pestis. Coverage depth is shown at the top of each inset, with SNPs shown as vertical colors across the yMT gene.
Figure 4
Figure 4. Live Strains of Antibiotic-Resistant Bacteria Cultured from City Surfaces
(A) A single colony was plated across four plates for each site (above), then tested for three different antibiotics: kanamycin, chloramphenicol, and ampicillin. We found five plates (circled in pink) that showed growth even in the presence of antibiotics, including one site (far left) with resistance to two antibiotics, with growth in multiple rows. (B) Number of taxa found for the plain swab (red) versus the bacteria cultured and then sequenced from LB (blue) and TSA media (yellow). (C) The coverage of the tetracycline-resistance genes was calculated as the ratio of the Tet+ samples (treated with tetracycline) versus the original sample (non-treated, or Tet), and the log2 ratio was plotted as a heat map (scale on left). (D) The distribution of coverage ratios for each tet gene for each of the cultured samples showed a greater coverage for the majority of tet genes in the Tet+ samples relative to the Tet, untreated samples and a convergence on the tetX gene for samples on both media types.
Figure 5
Figure 5. Taxa Diversity and Association with Human Body Areas
Detected bacteria were annotated relative to the most commonly associated body part from the Human Microbiome Project (HMP) dataset. (A) Of the 67 PathoMap species that matched the HMP dataset, the proportions were greatest for the GI-tract (blue), skin (green), and urogenital tract (white). The entire circle represents 100% of the 67 species, and the sizes of each color represent the proportion of each type of bacteria. (B) To account for the database proportions from the HMP, we calculated the log2 of the observed versus expected numbers of species found for each category, which indicated that skin was the most predominant type of bacteria on the subway system. (C) Boxplot of the number of species found per borough. Middle line of each section shows the median, and the top and bottom of each box show the 75th and 25th percentiles, respectively. Notches show the significant difference between groups (95% confidence interval). (D and E) Heat maps of NYC showing the density for Enterococcus faecium (D) and Staphylococcus aureus (E). Small red dots indicate the presence of a fully re-sequenced mecA gene. (F) Analysis of a subway station (picture on top shows the station) flooded during Hurricane Sandy. The Venn Diagram compares the unique set of 10 species in the data from that station that did not appear in any other station or area of NYC, but 52 species overlapped with the set of 627 species present in the subway system.
Figure 6
Figure 6. Hourly Dynamics of a Train Station Microbiome
Analysis of samples collected at Penn Station on one day, compared at each hour. (A) The proportional distribution of taxa (left) to the proportion of their presence at a specific time (right). The thickness of each line is in linear proportion to the number of detected taxa. (B) Proportion of each bacterial taxa (by genus) at each time point. Each taxa is colored and labeled in-line according to the same schema as in (A). The maximum number of species (n = 64) was found at 13:00, and the minimum (n = 51) at 11:00, which is proportional to the width of the plot.

Similar articles

See all similar articles

Cited by 78 articles

See all "Cited by" articles


    1. Alexander DH, Novembre J, Lange K. Fast model-based estimation of ancestry in unrelated individuals. Genome Res. 2009;19:1655–1664. - PMC - PubMed
    1. Amies CR. A modified formula for the preparation of Stuart’s Transport Medium. Can. J. Public Health. 1967;58:296–300. - PubMed
    1. Erik Aronesty. ea-utils: Command-line tools for processing biological sequencing data. 2011
    1. Be NA, Thissen JB, Fofanov VY, Allen JE, Rojas M, Golovko G, Fofanov Y, Koshinsky H, Jaing CJ. Metagenomic analysis of the airborne environment in urban spaces. Microb. Ecol. 2014;2014:29. - PMC - PubMed
    1. Blaser MJ. The microbiome revolution. J. Clin. Invest. 2014;124:4162–4165. - PMC - PubMed

LinkOut - more resources