MitoFates: improved prediction of mitochondrial targeting sequences and their cleavage sites

Mol Cell Proteomics. 2015 Apr;14(4):1113-26. doi: 10.1074/mcp.M114.043083. Epub 2015 Feb 10.

Abstract

Mitochondria provide numerous essential functions for cells and their dysfunction leads to a variety of diseases. Thus, obtaining a complete mitochondrial proteome should be a crucial step toward understanding the roles of mitochondria. Many mitochondrial proteins have been identified experimentally but a complete list is not yet available. To fill this gap, methods to computationally predict mitochondrial proteins from amino acid sequence have been developed and are widely used, but unfortunately, their accuracy is far from perfect. Here we describe MitoFates, an improved prediction method for cleavable N-terminal mitochondrial targeting signals (presequences) and their cleavage sites. MitoFates introduces novel sequence features including positively charged amphiphilicity, presequence motifs, and position weight matrices modeling the presequence cleavage sites. These features are combined with classical ones such as amino acid composition and physico-chemical properties as input to a standard support vector machine classifier. On independent test data, MitoFates attains better performance than existing predictors in both detection of presequences and in predicting their cleavage sites. We used MitoFates to look for undiscovered mitochondrial proteins from 42,217 human proteins (including isoforms such as alternative splicing or translation initiation variants). MitoFates predicts 1167 genes to have at least one isoform with a presequence. Five-hundred and eighty of these genes were not annotated as mitochondrial in either UniProt or Gene Ontology. Interestingly, these include candidate regulators of parkin translocation to damaged mitochondria, and also many genes with known disease mutations, suggesting that careful investigation of MitoFates predictions may be helpful in elucidating the role of mitochondria in health and disease. MitoFates is open source with a convenient web server publicly available.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Amino Acid Motifs
  • Amino Acid Sequence
  • Area Under Curve
  • Cluster Analysis
  • Computational Biology / methods*
  • Databases, Protein
  • Disease
  • Humans
  • Hydrophobic and Hydrophilic Interactions
  • Internet
  • Mitochondria / metabolism*
  • Mitochondrial Membranes / metabolism
  • Mitochondrial Proteins / metabolism
  • Molecular Sequence Data
  • Protein Isoforms / metabolism
  • Protein Sorting Signals*
  • Proteome
  • ROC Curve
  • Saccharomyces cerevisiae / metabolism
  • Saccharomyces cerevisiae Proteins / chemistry
  • Saccharomyces cerevisiae Proteins / metabolism

Substances

  • Mitochondrial Proteins
  • Protein Isoforms
  • Protein Sorting Signals
  • Proteome
  • Saccharomyces cerevisiae Proteins