Newfound Coding Potential of Transcripts Unveils Missing Members of Human Protein Communities

Genomics Proteomics Bioinformatics. 2022 Sep 29;S1672-0229(22)00124-3. doi: 10.1016/j.gpb.2022.09.008. Online ahead of print.


Recent proteogenomic approaches have led to the discovery that regions of the transcriptome previously annotated as non-coding regions [i.e., untranslated regions (UTRs), open reading frames overlapping annotated coding sequences in a different reading frame, and non-coding RNAs] frequently encode proteins (termed alternative proteins). This suggests that previously identified protein-protein interaction networks are partially incomplete since alternative proteins are not present in conventional protein databases. Here we used the proteogenomic resource OpenProt and a combined spectrum- and peptide-centric analysis for the re-analysis of a high-throughput human network proteomics dataset thereby revealing the presence of 261 alternative proteins in the network. We found 19 genes encoding both an annotated (reference) and an alternative protein interacting with each other. Of the 117 alternative proteins encoded by pseudogenes, 38 are direct interactors of reference proteins encoded by their respective parental gene. Finally, we experimentally validate several interactions involving alternative proteins. These data improve the blueprints of the human protein-protein interaction network and suggest functional roles for hundreds of alternative proteins.

Keywords: Affinity purification mass spectrometry; Alternative proteins; Protein network; Protein–protein interactions; Pseudogenes.