Background: The heterokonts are a particularly interesting group of eukaryotic organisms; they include many key species of planktonic and coastal algae and several important pathogens. To understand the biology of these organisms, it is necessary to be able to predict the subcellular localisation of their proteins but this is not straightforward, particularly in photosynthetic heterokonts which possess a complex chloroplast, acquired as the result of a secondary endosymbiosis. This is because the bipartite target peptides that deliver proteins to these chloroplasts can be easily confused with the signal peptides of secreted proteins, causing currently available algorithms to make erroneous predictions. HECTAR, a subcellular targeting prediction method which takes into account the specific properties of heterokont proteins, has been developed to address this problem.
Results: HECTAR is a statistical prediction method designed to assign proteins to five different categories of subcellular targeting: Signal peptides, type II signal anchors, chloroplast transit peptides, mitochondrion transit peptides and proteins which do not possess any N-terminal target peptide. The recognition rate of HECTAR is 96.3%, with Matthews correlation coefficients ranging from 0.67 to 0.95. The method is based on a hierarchical architecture which implements the divide and conquer approach to identify the different possible target peptides one at a time. At each node of the hierarchy, the most relevant outputs of various existing subcellular prediction methods are combined by a Support Vector Machine.
Conclusion: The HECTAR method is able to predict the subcellular localisation of heterokont proteins with high accuracy. It also efficiently predicts the subcellular localisation of proteins from cryptophytes, a group that is phylogenetically close to the heterokonts. A variant of HECTAR, called HECTARSEC, can be used to identify signal peptide and type II signal anchor sequences in proteins from any eukaryotic organism. Both HECTAR and HECTARSEC are available as a web application at the following address: http://www.sb-roscoff.fr/hectar/.