MiTPeptideDB: a proteogenomic resource for the discovery of novel peptides

Elizabeth Guruceaga; Alba Garin-Muga; Victor Segura

doi:10.1093/bioinformatics/btz530

MiTPeptideDB: a proteogenomic resource for the discovery of novel peptides

Bioinformatics. 2020 Jan 1;36(1):205-211. doi: 10.1093/bioinformatics/btz530.

Authors

Elizabeth Guruceaga^{1

2}, Alba Garin-Muga^{3

4}, Victor Segura^{1

2}

Affiliations

¹ Bioinformatics Platform, Center for Applied Medical Research, University of Navarra, Pamplona 31008, Spain.
² IdiSNA, Navarra Institute for Health Research, Pamplona 31008, Spain.
³ eHealth and Biomedical Applications Department, Vicomtech, San Sebastian 20009, Spain.
⁴ Biodonostia Health Research Institute, (Bioengineering Area), eHealth Group, San Sebastian 20014, Spain.

PMID: 31243428
DOI: 10.1093/bioinformatics/btz530

Abstract

Motivation: The principal lines of research in MS/MS based Proteomics have been directed toward the molecular characterization of the proteins including their biological functions and their implications in human diseases. Recent advances in this field have also allowed the first attempts to apply these techniques to the clinical practice. Nowadays, the main progress in Computational Proteomics is based on the integration of genomic, transcriptomic and proteomic experimental data, what is known as Proteogenomics. This methodology is being especially useful for the discovery of new clinical biomarkers, small open reading frames and microproteins, although their validation is still challenging.

Results: We detected novel peptides following a proteogenomic workflow based on the MiTranscriptome human assembly and shotgun experiments. The annotation approach generated three custom databases with the corresponding peptides of known and novel transcripts of both protein coding genes and non-coding genes. In addition, we used a peptide detectability filter to improve the computational performance of the proteomic searches, the statistical analysis and the robustness of the results. These innovative additional filters are specially relevant when noisy next generation sequencing experiments are used to generate the databases. This resource, MiTPeptideDB, was validated using 43 cell lines for which RNA-Seq experiments and shotgun experiments were available.

Availability and implementation: MiTPeptideDB is available at http://bit.ly/MiTPeptideDB.

Supplementary information: Supplementary data are available at Bioinformatics online.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Cell Line
Humans
Peptides* / genetics
Proteogenomics* / methods
Tandem Mass Spectrometry

Substances

Peptides