Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
, 26 (5), 666-679.e7

The Prevotella Copri Complex Comprises Four Distinct Clades Underrepresented in Westernized Populations


The Prevotella Copri Complex Comprises Four Distinct Clades Underrepresented in Westernized Populations

Adrian Tett et al. Cell Host Microbe.


Prevotella copri is a common human gut microbe that has been both positively and negatively associated with host health. In a cross-continent meta-analysis exploiting >6,500 metagenomes, we obtained >1,000 genomes and explored the genetic and population structure of P. copri. P. copri encompasses four distinct clades (>10% inter-clade genetic divergence) that we propose constitute the P. copri complex, and all clades were confirmed by isolate sequencing. These clades are nearly ubiquitous and co-present in non-Westernized populations. Genomic analysis showed substantial functional diversity in the complex with notable differences in carbohydrate metabolism, suggesting that multi-generational dietary modifications may be driving reduced prevalence in Westernized populations. Analysis of ancient metagenomes highlighted patterns of P. copri presence consistent with modern non-Westernized populations and a clade delineation time pre-dating human migratory waves out of Africa. These findings reveal that P. copri exhibits a high diversity that is underrepresented in Western-lifestyle populations.

Keywords: Iceman; Prevotella copri; Westernization; ancient DNA; bacterial pangenome; bacterial phylogenetics; comparative microbial genomics; gut microbes; human microbiome; metagenomic assembly; metagenomics.

Conflict of interest statement

The authors declare no competing interests.


Figure 1
Figure 1
The Four Distinct Clades of the P. copri Complex (A) Whole-genome phylogenetic tree of a representative subset of the four P. copri clades comprising the P. copri complex in relation to other sequenced members of the genera Prevotella, Alloprevotella, and Paraprevotella. Magenta circles indicate P. copri isolate sequences (built using 400 universal bacterial marker gene sequences, see STAR Methods). The phylogeny containing all P. copri genomes is available as Figure S1B and (see Data and Code Availability; Method Details). (B) Genetic distances within a clade (intra-clade), between clades (inter-clade), and between clades and other species (denoted as OS) of Prevotella, Alloprevotella, and Paraprevotella (inter-species), shown as pairwise average nucleotide identity distances (ANI distance). The dotted line denotes 5% ANI distance. (C) Pairwise SNV distances based on core gene alignment within and between clades (see STAR methods). (D) Jaccard distance based on pairwise gene content (see STAR Methods) between and within the P. copri clades.
Figure 2
Figure 2
Phylogenetic Representation of All 1,023 P. copri Genomes Separated for Each Clade of the P. copri Complex Outer ring is colored by continent of origin and inner ring is colored by country. Radial gray bars indicate recently sequenced isolate genomes, and publicly available reference genomes are denoted by black stars.
Figure 3
Figure 3
Prevalence of the P. copri Complex and Its Association with Non-Westernized Populations (A) P. copri prevalence in non-Westernized and Westernized datasets. “All” refers to the prevalence of any of the four clades being present. (B) Percentage of individuals harboring multiple P. copri clades. (C) P. copri complex pangenome sizes for non-Westernized individuals by dataset compared to Westernized individuals.
Figure 4
Figure 4
Functional Diversity of the P. copri Complex (A) Presence and absence of eggNOG functions significantly different between the four P. copri clades (yellow, present; black, absent) (see STAR Methods). (B) Multidimensional scaling (MDS) ordination based on CAZy families present in each genome showing distinct inter- and intra-clustering in the P. copri complex. (C) All CAZy families significantly enriched (left) or depleted (right) in at least one clade relative to each of the other three (see STAR Methods). Prevalence is defined as the percentage of genomes in that clade for which at least one gene belongs to the given CAZy family. For full list of CAZy prevalence in each clade, see Table S3.
Figure 5
Figure 5
Ancient Microbiomes and the Evolutionary History of the P. copri Complex (A) Ancient Mexican coprolite samples and intestinal and lung tissue sampled from the Iceman, a natural ice mummy. (B) Percentage of positive P. copri clade-specific markers identified in each ancient metagenomic sample. (C) Time-resolved phylogenetic tree of the P. copri complex; magenta star indicates the ancient coprolite sample, 2180 (see STAR Methods).

Comment in

Similar articles

See all similar articles

Cited by 1 PubMed Central articles


    1. Achilli A., Perego U.A., Bravi C.M., Coble M.D., Kong Q.P., Woodward S.R., Salas A., Torroni A., Bandelt H.J. The phylogeny of the four Pan-American MtDNA haplogroups: implications for evolutionary and disease studies. PLoS One. 2008;3:e1764. - PMC - PubMed
    1. Altschul S.F., Gish W., Miller W., Myers E.W., Lipman D.J. Basic local alignment search tool. J. Mol. Biol. 1990;215:403–410. - PubMed
    1. Andrews R.M., Kubacka I., Chinnery P.F., Lightowlers R.N., Turnbull D.M., Howell N. Reanalysis and revision of the Cambridge reference sequence for human mitochondrial DNA. Nat. Genet. 1999;23:147. - PubMed
    1. Asnicar F., Manara S., Zolfo M., Truong D.T., Scholz M., Armanini F., Ferretti P., Gorfer V., Pedrotti A., Tett A. Studying Vertical Microbiome Transmission from Mothers to Infants by Strain-Level Metagenomic Profiling. mSystems. 2017;2 e00164-16. - PMC - PubMed
    1. Asnicar F., Weingart G., Tickle T.L., Huttenhower C., Segata N. Compact graphical representation of phylogenetic data and metadata with GraPhlAn. PeerJ. 2015;3:e1029. - PMC - PubMed

Publication types

LinkOut - more resources