Determinants, discriminants, conserved residues--a heuristic approach to detection of functional divergence in protein families

PLoS One. 2011;6(9):e24382. doi: 10.1371/journal.pone.0024382. Epub 2011 Sep 12.


In this work, belonging to the field of comparative analysis of protein sequences, we focus on detection of functional specialization on the residue level. As the input, we take a set of sequences divided into groups of orthologues, each group known to be responsible for a different function. This provides two independent pieces of information: within group conservation and overlap in amino acid type across groups. We build our discussion around the set of scoring functions that keep the two separated and the source of the signal easy to trace back to its source.We propose a heuristic description of functional divergence that includes residue type exchangeability, both in the conservation and in the overlap measure, and does not make any assumptions on the rate of evolution in the groups other than the one under consideration. Residue types acceptable at a certain position within an orthologous group are described as a distribution which evolves in time, starting from a single ancestral type, and is subject to constraints that can be inferred only indirectly. To estimate the strength of the constraints, we compare the observed degrees of conservation and overlap with those expected in the hypothetical case of a freely evolving distribution.Our description matches the experiment well, but we also conclude that any attempt to capture the evolutionary behavior of specificity determining residues in terms of a scalar function will be tentative, because no single model can cover the variety of evolutionary behavior such residues exhibit. Especially, models expecting the same type of evolutionary behavior across functionally divergent groups tend to miss a portion of information otherwise retrievable by the conservation and overlap measures they use.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Amino Acids / metabolism*
  • Conserved Sequence*
  • Ligands
  • Models, Biological*
  • Multigene Family*
  • Protein Interaction Maps
  • Proteins / metabolism*
  • ROC Curve
  • Sequence Homology, Amino Acid
  • Small Molecule Libraries / metabolism


  • Amino Acids
  • Ligands
  • Proteins
  • Small Molecule Libraries