Molecular mimicry of host proteins is a common strategy adopted by bacterial pathogens to interfere with and exploit host processes. Despite the availability of pathogen genomes, few studies have attempted to predict virulence-associated mimicry relationships directly from genomic sequences. Here, we analyzed the proteomes of 62 pathogenic and 66 non-pathogenic bacterial species, and screened for the top pathogen-specific or pathogen-enriched sequence similarities to human proteins. The screen identified approximately 100 potential mimicry relationships including well-characterized examples among the top-scoring hits (e.g., RalF, internalin, yopH, and others), with about 1/3 of predicted relationships supported by existing literature. Examination of homology to virulence factors, statistically enriched functions, and comparison with literature indicated that the detected mimics target key host structures (e.g., extracellular matrix, ECM) and pathways (e.g., cell adhesion, lipid metabolism, and immune signaling). The top-scoring and most widespread mimicry pattern detected among pathogens consisted of elevated sequence similarities to ECM proteins including collagens and leucine-rich repeat proteins. Unexpectedly, analysis of the pathogen counterparts of these proteins revealed that they have evolved independently in different species of bacterial pathogens from separate repeat amplifications. Thus, our analysis provides evidence for two classes of mimics: complex proteins such as enzymes that have been acquired by eukaryote-to-pathogen horizontal transfer, and simpler repeat proteins that have independently evolved to mimic the host ECM. Ultimately, computational detection of pathogen-specific and pathogen-enriched similarities to host proteins provides insights into potentially novel mimicry-mediated virulence mechanisms of pathogenic bacteria.
Keywords: bacteria; collagen; extracellular matrix; leucine-rich repeats; mimicry; pathogenomics; pathogens; proteins; proteomes; virulence factors.