Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2009 Jun 29;10:202.
doi: 10.1186/1471-2105-10-202.

NLStradamus: A Simple Hidden Markov Model for Nuclear Localization Signal Prediction

Affiliations
Free PMC article

NLStradamus: A Simple Hidden Markov Model for Nuclear Localization Signal Prediction

Alex N Nguyen Ba et al. BMC Bioinformatics. .
Free PMC article

Abstract

Background: Nuclear localization signals (NLSs) are stretches of residues within a protein that are important for the regulated nuclear import of the protein. Of the many import pathways that exist in yeast, the best characterized is termed the 'classical' NLS pathway. The classical NLS contains specific patterns of basic residues and computational methods have been designed to predict the location of these motifs on proteins. The consensus sequences, or patterns, for the other import pathways are less well-understood.

Results: In this paper, we present an analysis of characterized NLSs in yeast, and find, despite the large number of nuclear import pathways, that NLSs seem to show similar patterns of amino acid residues. We test current prediction methods and observe a low true positive rate. We therefore suggest an approach using hidden Markov models (HMMs) to predict novel NLSs in proteins. We show that our method is able to consistently find 37% of the NLSs with a low false positive rate and that our method retains its true positive rate outside of the yeast data set used for the training parameters.

Conclusion: Our implementation of this model, NLStradamus, is made available at: (http://www.moseslab.csb.utoronto.ca/NLStradamus/).

Figures

Figure 1
Figure 1
True positive and false positive rate of consensus and alignments based methods. a) True positive and false positive rate of a consensus-based method on all NLSs from our dataset. The false positive rate is shown as the error rate per amino acid residue. The diagonal line depicts a ratio of one true prediction per false prediction per amino acid residue. b) True positive and false positive rate of consensus and alignment based methods on classical NLSs from our dataset. The false positive rate is shown as the error rate per amino acid residue. The diagonal line depicts a ratio of one true prediction per false prediction per amino acid residue.
Figure 2
Figure 2
Alignment of characterized classical nuclear localization signals. Alignment of the residues thought to contribute to NLS binding to importin-α. The residues aligned on the cNLS major binding site were then used as model for a profile HMM approach using HMMer.
Figure 3
Figure 3
Lysine and arginine content of characterized nuclear localization signals. Plot of the lysine and arginine content of characterized nuclear localization signals with respect to their length. The plot shows the three 'types' of NLSs present in our study.
Figure 4
Figure 4
Schematic of our two state and four state HMM. a) The two state HMM models two states which are represented by the 'background', which emits residues with the same frequency as the genome, and by the 'NLS' state, which emits residues with the same frequency as the NLSs from our characterized data. b) The four state HMM models four states which are represented by the 'background', which emits residues with the same frequency as the genome, two 'NLS' states, which emit residues with the same frequency as our characterized NLSs, separated by a 'spacer' state which emits residues with the same frequency as the genome.
Figure 5
Figure 5
True positive and false positive rate of our model. True positive and false positive rate of various methods, including our HMM at various posterior threshold and the Viterbi algorithm on our dataset. The false positive rate is shown as the error rate per amino acid residue. The diagonal line depicts a ratio of one true prediction per false prediction per amino acid residue.
Figure 6
Figure 6
Posterior trace of Swi5p and Ste5p for our two HMMs. a) Posterior trace of Swi5p, a characterized bipartite cNLS, using our four state model. Output was generated by NLStradamus and highlighted region shows the region of characterized NLS. Black (i) and blue (iii) lines represent the two patches of basic residues while the pink line (ii) represents the spacer. Green line represents the sum of the three NLS states. Red line is shown as a reference for a threshold of 0.6. b) Posterior trace of Swi5p, a characterized bipartite NLS, using our simple two state model. Output was generated by NLStradamus and highlighted region shows the region of characterized NLS. Horizontal red line depicts the chosen posterior threshold of 0.6. c) Posterior trace of Ste5p, a characterized bipartite importin-β dependent NLS (non-cNLS), using our four state model. Output was generated by NLStradamus and highlighted region shows the region of characterized NLS. Black (i) and blue (iii) lines represent the two patches of basic residues while the pink line (ii) represents the spacer. Green line represents the sum of the three NLS states. Red line is shown as a reference for a threshold of 0.6. d) Posterior trace of Ste5p, a characterized bipartite non-classical NLS, using our simple two state model. Output was generated by NLStradamus and highlighted region shows the region of characterized NLS. Horizontal red line depicts the chosen posterior threshold of 0.6.
Figure 7
Figure 7
True positive and false positive rate of our model on other species. True positive and false positive rate of various methods, including our HMM at various posterior threshold and the Viterbi algorithm on the PredictNLS dataset. This ROC curve was created by counting overlaps. The false positive rate is shown as the error rate per amino acid residue. The diagonal line depicts a ratio of one true prediction per false prediction per amino acid residue.

Similar articles

See all similar articles

Cited by 168 articles

See all "Cited by" articles

References

    1. Lange A, Mills RE, Lange CJ, Stewart M, Devine SE, Corbett AH. J Biol Chem. 2007;282:5101–5. doi: 10.1074/jbc.R600026200. Epub 2006 Dec 14. - DOI - PMC - PubMed
    1. Poon IK, Jans DA. Regulation of nuclear transport: central role in development and transformation? Traffic. 2005;6:173–86. doi: 10.1111/j.1600-0854.2005.00268.x. - DOI - PubMed
    1. Rout MP, Blobel G. Isolation of the yeast nuclear pore complex. J Cell Biol. 1993;123:771–83. doi: 10.1083/jcb.123.4.771. - DOI - PMC - PubMed
    1. Panté N, Aebi U. The nuclear pore complex. J Cell Biol. 1993;122:977–84. doi: 10.1083/jcb.122.5.977. - DOI - PMC - PubMed
    1. Görlich D, Mattaj IW. Nucleocytoplasmic transport. Science. 1996;271:1513–8. doi: 10.1126/science.271.5255.1513. - DOI - PubMed

Publication types

Substances

LinkOut - more resources

Feedback