Myristoylation by the myristoyl-CoA:protein N-myristoyltransferase (NMT) is an important lipid anchor modification of eukaryotic and viral proteins. Automated prediction of N-terminal N-myristoylation from the substrate protein sequence alone is necessary for large-scale sequence annotation projects but it requires a low rate of false positive hits in addition to a sufficient sensitivity. Our previous analysis of substrate protein sequence variability, NMT sequences and 3D structures has revealed motif properties in addition to the known PROSITE motif that are utilized in a new predictor described here. The composite prediction function (with separate ad hoc parameterization (a) for queries from non-fungal eukaryotes and their viruses and (b) for sequences from fungal species) consists of terms evaluating amino acid type preferences at sequences positions close to the N terminus as well as terms penalizing deviations from the physical property pattern of amino acid side-chains encoded in multi-residue correlation within the motif sequence. The algorithm has been validated with a self-consistency and two jack-knife tests for the learning set as well as with kinetic data for model substrates. The sensitivity in recognizing documented NMT substrates is above 95 % for both taxon-specific versions. The corresponding rate of false positive prediction (for sequences with an N-terminal glycine residue) is close to 0.5 %; thus, the technique is applicable for large-scale automated sequence database annotation. The predictor is available as public WWW-server with the URL http://mendel.imp.univie.ac.at/myristate/. Additionally, we propose a version of the predictor that identifies a number of proteolytic protein processing sites at internal glycine residues and that evaluates possible N-terminal myristoylation of the protein fragments.A scan of public protein databases revealed new potential NMT targets for which the myristoyl modification may be of critical importance for biological function. Among others, the list includes kinases, phosphatases, proteasomal regulatory subunit 4, kinase interacting proteins KIP1/KIP2, protozoan flagellar proteins, homologues of mitochondrial translocase TOM40, of the neuronal calcium sensor NCS-1 and of the cytochrome c-type heme lyase CCHL. Analyses of complete eukaryote genomes indicate that about 0.5 % of all encoded proteins are apparent NMT substrates except for a higher fraction in Arabidopsis thaliana ( approximately 0.8 %).
Copyright 2002 Elsevier Science Ltd.