Chloroplast transit peptide prediction: a peek inside the black box

Nucleic Acids Res. 2001 Aug 15;29(16):E82. doi: 10.1093/nar/29.16.e82.

Abstract

Previous work in predicting protein localization to the chloroplast organelle in plants led to the development of an artificial neural network-based approach capable of remarkable accuracy in its prediction (ChloroP). A common criticism against such neural network models is that it is difficult to interpret the criteria that are used in making predictions. We address this concern with several new prediction methods that base predictions explicitly on the abundance of different amino acid types in the N-terminal region of the protein. Our successful prediction accuracy suggests that ChloroP uses little positional information in its decision-making; an unexpected result given the elaborate ChloroP input scheme. By removing positional information, our simpler methods allow us to identify those amino acids that are useful for successful prediction. The identification of important sequence features, such as amino acid content, is advantageous if one of the goals of localization predictors is to gain an understanding of the biological process of chloroplast localization. Our most accurate predictor combines principal component analysis and logistic regression. Web-based prediction using this method is available online at http://apicoplast.cis.upenn.edu/pclr/.

Publication types

  • Research Support, U.S. Gov't, P.H.S.

MeSH terms

  • Algorithms
  • Amino Acid Motifs
  • Amino Acids / analysis
  • Chloroplasts / chemistry
  • Chloroplasts / metabolism*
  • Computational Biology / methods*
  • Internet
  • Logistic Models
  • Neural Networks, Computer*
  • Protein Sorting Signals / physiology*
  • Protein Transport*
  • Proteins / chemistry*
  • Proteins / classification
  • Proteins / metabolism*
  • Reproducibility of Results
  • Sensitivity and Specificity
  • Software

Substances

  • Amino Acids
  • Protein Sorting Signals
  • Proteins