Prediction of functional specificity determinants from protein sequences using log-likelihood ratios

Jimin Pei; Wei Cai; Lisa N Kinch; Nick V Grishin

doi:10.1093/bioinformatics/bti766

Prediction of functional specificity determinants from protein sequences using log-likelihood ratios

Bioinformatics. 2006 Jan 15;22(2):164-71. doi: 10.1093/bioinformatics/bti766. Epub 2005 Nov 8.

Authors

Jimin Pei¹, Wei Cai, Lisa N Kinch, Nick V Grishin

Affiliation

¹ Howard Hughes Medical Institute, University of Texas Southwestern Medical Center 5323 Harry Hines Boulevard, Dallas, TX 75390-9050, USA.

PMID: 16278237
DOI: 10.1093/bioinformatics/bti766

Abstract

Motivation: A number of methods have been developed to predict functional specificity determinants in protein families based on sequence information. Most of these methods rely on pre-defined functional subgroups. Manual subgroup definition is difficult because of the limited number of experimentally characterized subfamilies with differing specificity, while automatic subgroup partitioning using computational tools is a non-trivial task and does not always yield ideal results.

Results: We propose a new approach SPEL (specificity positions by evolutionary likelihood) to detect positions that are likely to be functional specificity determinants. SPEL, which does not require subgroup definition, takes a multiple sequence alignment of a protein family as the only input, and assigns a P-value to every position in the alignment. Positions with low P-values are likely to be important for functional specificity. An evolutionary tree is reconstructed during the calculation, and P-value estimation is based on a random model that involves evolutionary simulations. Evolutionary log-likelihood is chosen as a measure of amino acid distribution at a position. To illustrate the performance of the method, we carried out a detailed analysis of two protein families (LacI/PurR and G protein alpha subunit), and compared our method with two existing methods (evolutionary trace and mutual information based). All three methods were also compared on a set of protein families with known ligand-bound structures.

Availability: SPEL is freely available for non-commercial use. Its pre-compiled versions for several platforms and alignments used in this work are available at ftp://iole.swmed.edu/pub/SPEL/

Publication types

Evaluation Study
Research Support, N.I.H., Extramural

MeSH terms

Algorithms*
Binding Sites
Computer Simulation
Conserved Sequence
Likelihood Functions
Models, Biological
Models, Chemical*
Models, Molecular*
Models, Statistical
Protein Binding
Protein Conformation
Proteins / chemistry*
Proteins / classification*
Proteins / metabolism
Sensitivity and Specificity
Sequence Alignment / methods*
Sequence Analysis, Protein / methods*
Sequence Homology, Amino Acid
Software

Substances

Proteins

Grants and funding

GM67165/GM/NIGMS NIH HHS/United States