NLScore: a novel quantitative algorithm based on 3 dimensional structural determinants to predict the probability of nuclear localization in proteins containing classical nuclear localization signals

J Mol Model. 2017 Aug 9;23(9):258. doi: 10.1007/s00894-017-3420-y.

Abstract

The presence of a nuclear localization signal (NLS) in proteins can be inferred by the presence of a stretch of basic amino acids (KRKK). These NLSs are termed classical NLS (cNLS). However, only a fraction of proteins containing the cNLS pattern are transported into the nucleus by binding to importin α. Hence, there must exist, additional structural determinants that guide the appropriate interaction between putative NLSs containing cargo and importin α. Using 52 protein structures containing cNLS obtained from RCSB PDB, we assembled a training set and a validation set such that both sets were comprised of a combination of proteins with proven nuclear localization and ones that were non-nuclear. We modeled the interface between cargoes containing cNLS and importin α. We conducted rigid body docking and produced induced-fit modes by allowing both side chain and the backbone to be flexible. The output of these studies and additional determinants such as energy of interaction, atomic contacts, hydrophilic interaction, cationic interaction, and penetration of the cargo protein were used to derive a 26 parameter quantitative structure activity relationship based regression equation. This was further optimized by a step-wise backward elimination approach to derive a 15 parameter score. This NLScore was not only able to correctly classify confirmed nuclear and non-nuclear localized proteins but it was able to perform better than currently implemented algorithms like NucPred, Euk-mPLoc 2.0, cNls Mapper, and NLStradamus. Leave-one-out cross validation (LOOCV) showed that NLScore correctly predicted 78.6% and 81.6% of non-nuclear and nuclear proteins respectively. Graphical abstract NLScore: a novel quantitative algorithm based on 3 dimensional structural determinants to predict the probability of nuclear localization in proteins.

Keywords: In-silico model; Linear-regression; Multi-parameter; Sub-cellular location; Tertiary structure.

MeSH terms

  • Algorithms
  • Computational Biology / methods*
  • Computer Simulation*
  • Humans
  • Models, Molecular
  • Nuclear Localization Signals*
  • Nuclear Proteins / metabolism*
  • Protein Structure, Tertiary
  • Software*
  • alpha Karyopherins / metabolism*

Substances

  • Nuclear Localization Signals
  • Nuclear Proteins
  • alpha Karyopherins