Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
, 5, 17

Initial Characterization of the Human Central Proteome


Initial Characterization of the Human Central Proteome

Thomas R Burkard et al. BMC Syst Biol.


Background: On the basis of large proteomics datasets measured from seven human cell lines we consider their intersection as an approximation of the human central proteome, which is the set of proteins ubiquitously expressed in all human cells. Composition and properties of the central proteome are investigated through bioinformatics analyses.

Results: We experimentally identify a central proteome comprising 1,124 proteins that are ubiquitously and abundantly expressed in human cells using state of the art mass spectrometry and protein identification bioinformatics. The main represented functions are proteostasis, primary metabolism and proliferation. We further characterize the central proteome considering gene structures, conservation, interaction networks, pathways, drug targets, and coordination of biological processes. Among other new findings, we show that the central proteome is encoded by exon-rich genes, indicating an increased regulatory flexibility through alternative splicing to adapt to multiple environments, and that the protein interaction network linking the central proteome is very efficient for synchronizing translation with other biological processes. Surprisingly, at least 10% of the central proteome has no or very limited functional annotation.

Conclusions: Our data and analysis provide a new and deeper description of the human central proteome compared to previous results thereby extending and complementing our knowledge of commonly expressed human proteins. All the data are made publicly available to help other researchers who, for instance, need to compare or link focused datasets to a common background.


Figure 1
Figure 1
Network and pathways statistics. (A) Node degree (number of edges). Note the strong shift of C.Prot towards higher values. We also observe the absence of shift of the tissue specific genes (Spe.Trans) and the gradual shift from low abundant C.Prot entities to high abundant ones. (B) Eigenvector centrality values also display similar shifts, although in this case Spe.Trans even reverses the trend and differences between low and high abundant C.Prot are more modest. (C) Relative positions in pathways; 0 = beginning, 1 = end. No real bias for C.Prot but a strong preference for central position for its abundant proteins. Spe.Trans and low abundant C.Prot are more spread over all possible positions. (D) The same for drug targets. Note the strong shift towards initial positions for C.Prot drug targets, which significantly amplifies the already present preference of drug targets for such positions.
Figure 2
Figure 2
The central interactome. (A) Shortest path distance distributions. We first remark that distances between C.Prot entities (red) are closer than distances between proteins of the human interactome (black), i.e. short distances below 4, which is the mean and median distance, are over-represented. Remarkably, C.Prot is also closer than on average to the non C.Prot proteins (orange). The abundant C.Prot proteins are even closer to each other and to the non C.Prot proteins (cyan and blue). It shows that C.Prot (and its most abundant components) are embedded "uniformly" in the human proteome. (B) Power law distribution of the whole human interactome versus the central interactome. The central interactome is more connected (exponent -1.1), i.e. frequency of high node degrees decreases slower, than the whole (exponent -1.8). (C) Central interactome with mapped significant biological processes (Table 2). Processes not significantly enriched in C.Prot are in black and multiple GO annotations are depicted by a circle (color chosen randomly) as opposed to a square for single GO. Shared GO term ancestors at a node were removed to eliminate trivial multiple annotations and stay at the most specific levels. We note that, except for a few, processes are not strongly localized in this network. It does not represent juxtaposed pathways but rather an exchange platform. We also observe that most proteins have multiple GO BP annotations (circular node shape), which de facto establish additional exchanges between fundamental cellular processes. Finally, we recognized some important complexes: (a) exosome, (b) ubiquitinol-cytochrome c reducatase, (c) NADH dehydrogenase, (d) oligosaccharyl transferase, (e) proteasome, (f) COPI, (g) ribonucleoprotein/splicosome, (h) proton-transporting ATP synthase, (i) ribosome, (j) signal recognition particle, (k) cytochrome c oxidase subunits, (l) pyruvate/2-oxoglutarate dehydrogenase complex, (m) prefoldin, (n) condensin, (o) Signal peptidase complex, (p) COPII, (q) septin complex. Network visualized with Cytoscape [56].
Figure 3
Figure 3
Inter-biological process exchanges over the central interactome. High-scoring fluxes between biological processes provide us with a mean to summarize the main function of the central interactome, a subset of the human interactome that is likely to be expressed in all the human cells. In our scoring scheme, high scores represent fluxes that are much more intense than expected from GO term frequencies and protein connectivity, i.e. exchanges significantly favored by protein interactions. GO biological processes are represented as nodes and scores by the edge thickness. (A) Fluxes within the central interactome. The star-like topology with translation (red) at its center shows that most exchanges synchronize other cellular processes with translation. The strongest crosstalk can be observed between translation and GO categories (blue), which contain many members of the nucleic acid metabolism (needed for mRNA generation) and complexes such as signal recognition particle, coatomer protein complex and the splicosome. (B) Fluxes between C.Prot proteins and proteins not in C.Prot. As soon as the focus shifts away from the central interactome, translation loses its role as central communicator. Communication between C.Prot and non C.Prot are less specialized. Also, note the lost interconnectivity of the blue cluster, which reflects reduced activity of the processes mentioned above. (C) This trend is further amplified in the external fluxes between proteins not in C.Prot that become essentially global and ignore translation.
Figure 4
Figure 4
Drug targets GO terms variation along pathways. Integration of GO biological process (BP) analysis and pathway positions. Proteins at the source (0-0.2), center (0.4-0.6) and end (0.8-1) of pathways in C.Prot and drug targets restricted to C.Prot are submitted to GO analysis. All the BP terms with P-values < 0.1% in at least one case are reported and we see that the general strong reduction for central pathway position (Figure 1D) is rather uniform over the BPs. The barplots represent the coverage of the GO terms.

Similar articles

See all similar articles

Cited by 27 articles

See all "Cited by" articles


    1. Hood L, Heath JR, Phelps ME, Lin B. Systems biology and new technologies enable predictive and preventative medicine. Science. 2004;306(5696):640–643. doi: 10.1126/science.1104635. - DOI - PubMed
    1. Gavin AC, Aloy P, Grandi P, Krause R, Boesche M, Marzioch M, Rau C, Jensen LJ, Bastuck S, Dumpelfeld B. et al. Proteome survey reveals modularity of the yeast cell machinery. Nature. 2006;440(7084):631–636. doi: 10.1038/nature04532. - DOI - PubMed
    1. Krogan NJ, Cagney G, Yu H, Zhong G, Guo X, Ignatchenko A, Li J, Pu S, Datta N, Tikuisis AP. et al. Global landscape of protein complexes in the yeast Saccharomyces cerevisiae. Nature. 2006;440(7084):637–643. doi: 10.1038/nature04670. - DOI - PubMed
    1. Kashtan N, Alon U. Spontaneous evolution of modularity and network motifs. Proc Natl Acad Sci USA. 2005;102(39):13773–13778. doi: 10.1073/pnas.0503610102. - DOI - PMC - PubMed
    1. de Chassey B, Navratil V, Tafforeau L, Hiet MS, Aublin-Gex A, Agaugue S, Meiffren G, Pradezynski F, Faria BF, Chantier T. et al. Hepatitis C virus infection protein network. Mol Syst Biol. 2008;4:230. doi: 10.1038/msb.2008.66. - DOI - PMC - PubMed

Publication types