Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2011 Apr;23(4):1556-72.
doi: 10.1105/tpc.111.084095. Epub 2011 Apr 12.

Identification of Novel Plant Peroxisomal Targeting Signals by a Combination of Machine Learning Methods and in Vivo Subcellular Targeting Analyses

Affiliations
Free PMC article

Identification of Novel Plant Peroxisomal Targeting Signals by a Combination of Machine Learning Methods and in Vivo Subcellular Targeting Analyses

Thomas Lingner et al. Plant Cell. .
Free PMC article

Abstract

In the postgenomic era, accurate prediction tools are essential for identification of the proteomes of cell organelles. Prediction methods have been developed for peroxisome-targeted proteins in animals and fungi but are missing specifically for plants. For development of a predictor for plant proteins carrying peroxisome targeting signals type 1 (PTS1), we assembled more than 2500 homologous plant sequences, mainly from EST databases. We applied a discriminative machine learning approach to derive two different prediction methods, both of which showed high prediction accuracy and recognized specific targeting-enhancing patterns in the regions upstream of the PTS1 tripeptides. Upon application of these methods to the Arabidopsis thaliana genome, 392 gene models were predicted to be peroxisome targeted. These predictions were extensively tested in vivo, resulting in a high experimental verification rate of Arabidopsis proteins previously not known to be peroxisomal. The prediction methods were able to correctly infer novel PTS1 tripeptides, which even included novel residues. Twenty-three newly predicted PTS1 tripeptides were experimentally confirmed, and a high variability of the plant PTS1 motif was discovered. These prediction methods will be instrumental in identifying low-abundance and stress-inducible peroxisomal proteins and defining the entire peroxisomal proteome of Arabidopsis and agronomically important crop plants.

Figures

Figure 1.
Figure 1.
Categorization of Plant PTS1 Protein Example Sequences and Summary of Experimentally Validated Amino Acid Residues Forming the Plant PTS1 Motif. The 2562 positive example sequences were split into three data subsets according to the number of sequences with the same C-terminal tripeptide. Data set 1, containing 2458 sequences and 42 different C-terminal tripeptides, each represented by ≥3 sequences, was used for training of the prediction models, while data sets 2 and 3 contained unseen sequences and C-terminal tripeptides and were used for model testing. Tripeptide residues previously reported to be present in plant PTS1 tripeptides are shaded in gray. According to experimental data and PWM predictions, at least two of the seven high-abundance residues of high targeting strength ([SA][KR][LMI]>, boxed; see Supplemental Figure 1B online) must be combined with one low-abundance residue to yield functional plant PTS1 tripeptides (x[KR][LMI]>, [SA]y[LMI]>, and [SA][KR]z>).
Figure 2.
Figure 2.
Experimental Validation of Example Sequences by in Vivo Subcellular Targeting Analysis. Onion epidermal cells were transformed biolistically with EYFP fusion constructs that were C-terminally extended by the C-terminal decapeptides of plant PTS1 proteins serving as example sequences. Subcellular targeting was analyzed by fluorescence microscopy after ~18 h expression at room temperature only ([B], [C], [E] to [G], [J] to [O], [Q], [T], [V], [X], [Z], [Aa], and [Ab]), at an additional 24 h at ~10°C ([A] and [Ac] to [Ag]), or at an additional 5 to 6 d at ~10°C ([D], [H], [I], [P], [R], [S], [U], [W], and [Y]). Cytosolic constructs, for which subcellular targeting data are shown after short-term expression times, were reproducibly confirmed as cytosolic also after long-term expression. Novel amino acid residues of PTS1 tripeptides are underlined. In double transformants, peroxisomes were labeled with CFP, and cyan fluorescence was converted to red for image overlay ([G], [N], [O], [V], [Z], [Aa], and [Ab]). To document the efficiency of peroxisome targeting, EYFP images of single transformants were not modified for brightness or contrast. The sequences that terminated with LNL> and LCR> were included as putative non-PTS1 sequences ([Af] and [Ag]). Comparative subcellular targeting results obtained under different expression conditions are shown in Supplemental Figure 2 online. For sequence details, see Supplemental Tables 1 and 6 online.
Figure 3.
Figure 3.
Performance Analysis of the PWM and RI Prediction Models on Example PTS1 Protein Sequences. The x axis indicates the start position of the C-terminal PTS1 domain that was considered for performance analysis and extends to the extreme C termini of the PTS1 proteins. For the definition of sensitivity, specificity, and harmonic mean, see Supplemental Methods online.
Figure 4.
Figure 4.
Venn Diagram of PWM- and RI-Model Based PTS1 Protein Predictions for Arabidopsis. The 392 gene models (GM; i.e., transcriptional and translational protein variants) and 320 gene loci (GL; i.e., protein coding genes) are predicted PTS1 proteins by either the PWM or the RI model. Except for three proteins (At1g21770.1, At4g02340.1, and At5g02660.1), the RI model predicted a protein subset of those predicted by the PWM model to be peroxisome-targeted PTS1 proteins. For details on PWM and RI model predictions for the 35,385 Arabidopsis gene models (TAIR10, November, 2010; 27,416 loci), see Supplemental Data Set 2 online. The 392 gene models (320 gene loci) include 109 gene models (79 gene loci) encoding established plant peroxisomal PTS1 proteins, 12 gene models (10 gene loci) associated with plant peroxisomes based on proteomics data only, and 271 gene models (231 gene loci) that had not yet been associated with peroxisomes, indicating that up to 70% of Arabidopsis PTS1 proteins might have remained unidentified up to now.
Figure 5.
Figure 5.
Experimental Validation of Arabidopsis Proteins Newly Predicted to Be Located in Peroxisomes by in Vivo Subcellular Targeting Analysis. Onion epidermal cells were transformed biolistically with EYFP fusion constructs that were either C-terminally extended by the C-terminal decapeptide of representative Arabidopsis proteins (or the 15–amino acid peptide for PK1, P) or fused with Arabidopsis full-length cDNAs. Novel amino acid residues of newly identified functional PTS1 tripeptides (in addition to those identified in Figure 2) are underlined. Subcellular targeting was analyzed by fluorescence microscopy after ~18 h expression at room temperature only ([A] to [C], [F], [H], [I], [K], [M], [R] to [T], [W], and [X]), at an additional 24 h at ~10°C ([D], [E], [G], [J], [N] to [Q], [U], and [V]), or at an additional 5 to 6 d of expression at ~10°C (L). Cytosolic constructs, for which subcellular targeting data are shown after short-term expression times, were reproducibly confirmed as cytosolic also after long-term expression. In double transformants, peroxisomes were labeled with CFP, and cyan fluorescence was converted to red for image overlay ([A], [H], [L], [M], and [Q] to [W]). The predicted PTS1 domains investigated derived from the following proteins: SCL> (UP9), SPL>(1) (FAH), SWL> (RING), KRL> (Tudor), SYM> (SDRc, At3g01980.1/3/4), APN> (SDRc, At3g01980.2), SEL> (SPK1), SRY> (PHD), SIL> (ANK), IKL> (LCAT), LKL> (CPK1), VKL> (CUT1), AHL> (PAP7), and PK1 (SKL>; Ma and Reumann, 2008). The predicted PTS1 tripeptides of the Arabidopsis full-length proteins are the following: CP (SKL>), CHY1H1 and CHY1H2 (both AKL>), SDRc (SYM>), S28FP (SSM>), NUDT19 (SSL>), pxPfkB (SML>), and CUT1 (VKL>). To document the efficiency of peroxisome targeting, EYFP images of single transformants were not modified for brightness or contrast. The Arabidopsis Genome Initiative codes of the Arabidopsis proteins are listed in Supplemental Table 5 online.

Similar articles

See all similar articles

Cited by 35 articles

See all "Cited by" articles

Publication types

LinkOut - more resources

Feedback