One of the goals of the Chromosome-Centric Human Proteome Project (C-HPP) is to map and characterize the functions of protein isoforms produced by alternative splicing of genes. However, identifying alternative splice variants (ASVs) via mass spectrometry remains a major challenge, because ASVs usually contain highly homologous peptide sequences. A routine protein sequence analysis suggests that more than half of the investigated proteins do not generate two or more uniquely mapping peptides that would enable their isoforms to be distinguished. Here, we develop a new proteogenomics method, named "ASV-ID" (alternative splicing variants identification), which enables identification of ASVs by using a cell type-specific protein sequence database that is supported by RNA-Seq data. Using this workflow, we identify 1935 distinct proteins under highly stringent conditions. In fact, transcript evidence on these 841 proteins helps us distinguish them from other isoforms, despite the fact that these proteins are not predicted to make 2 or more uniquely mapping peptides. We also demonstrate that ASV-ID enables detection of 19 differently expressed isoforms present in several cell lines. Thus, a new workflow using ASV-ID has the potential to map yet-to-be-identified difficult protein isoforms in a simple and robust way.
Keywords: RNA-sequencing; alternative splicing variants; cell type-specific sequence database; proteogenomics.