Identification of novel plant peroxisomal targeting signals by a combination of machine learning methods and in vivo subcellular targeting analyses

Plant Cell. 2011 Apr;23(4):1556-72. doi: 10.1105/tpc.111.084095. Epub 2011 Apr 12.


In the postgenomic era, accurate prediction tools are essential for identification of the proteomes of cell organelles. Prediction methods have been developed for peroxisome-targeted proteins in animals and fungi but are missing specifically for plants. For development of a predictor for plant proteins carrying peroxisome targeting signals type 1 (PTS1), we assembled more than 2500 homologous plant sequences, mainly from EST databases. We applied a discriminative machine learning approach to derive two different prediction methods, both of which showed high prediction accuracy and recognized specific targeting-enhancing patterns in the regions upstream of the PTS1 tripeptides. Upon application of these methods to the Arabidopsis thaliana genome, 392 gene models were predicted to be peroxisome targeted. These predictions were extensively tested in vivo, resulting in a high experimental verification rate of Arabidopsis proteins previously not known to be peroxisomal. The prediction methods were able to correctly infer novel PTS1 tripeptides, which even included novel residues. Twenty-three newly predicted PTS1 tripeptides were experimentally confirmed, and a high variability of the plant PTS1 motif was discovered. These prediction methods will be instrumental in identifying low-abundance and stress-inducible peroxisomal proteins and defining the entire peroxisomal proteome of Arabidopsis and agronomically important crop plants.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Amino Acid Sequence
  • Arabidopsis / genetics
  • Arabidopsis / metabolism*
  • Arabidopsis Proteins / chemistry
  • Arabidopsis Proteins / metabolism*
  • Artificial Intelligence*
  • Computational Biology / methods*
  • Databases, Protein
  • Genome, Plant / genetics
  • Models, Biological
  • Molecular Sequence Data
  • Peptides
  • Peroxisomes / metabolism*
  • Protein Sorting Signals*
  • Protein Transport
  • Reproducibility of Results
  • Subcellular Fractions / metabolism


  • Arabidopsis Proteins
  • Peptides
  • Protein Sorting Signals