Using orthologous and paralogous proteins to identify specificity-determining residues in bacterial transcription factors

J Mol Biol. 2002 Aug 2;321(1):7-20. doi: 10.1016/s0022-2836(02)00587-9.


Concepts of orthology and paralogy are become increasingly important as whole-genome comparison allows their identification in complete genomes. Functional specificity of proteins is assumed to be conserved among orthologs and is different among paralogs. We used this assumption to identify residues which determine specificity of protein-DNA and protein-ligand recognition. Finding such residues is crucial for understanding mechanisms of molecular recognition and for rational protein and drug design. Assuming conservation of specificity among orthologs and different specificity of paralogs, we identify residues that correlate with this grouping by specificity. The method is taking advantage of complete genomes to find multiple orthologs and paralogs. The central part of this method is a procedure to compute statistical significance of the predictions. The procedure is based on a simple statistical model of protein evolution. When applied to a large family of bacterial transcription factors, our method identified 12 residues that are presumed to determine the protein-DNA and protein-ligand recognition specificity. Structural analysis of the proteins and available experimental results strongly support our predictions. Our results suggest new experiments aimed at rational re-design of specificity in bacterial transcription factors by a minimal number of mutations.

Publication types

  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, P.H.S.

MeSH terms

  • Bacteria / chemistry
  • Bacteria / genetics
  • Bacterial Proteins / chemistry*
  • Bacterial Proteins / genetics
  • Bacterial Proteins / metabolism*
  • Binding Sites
  • Conserved Sequence*
  • DNA / chemistry
  • DNA / metabolism
  • DNA-Binding Proteins / chemistry
  • DNA-Binding Proteins / genetics
  • DNA-Binding Proteins / metabolism
  • Dimerization
  • Entropy
  • Escherichia coli Proteins*
  • Evolution, Molecular*
  • Lac Repressors
  • Ligands
  • Models, Molecular
  • Molecular Conformation
  • Multigene Family
  • Mutation
  • Protein Binding
  • Repressor Proteins / chemistry
  • Repressor Proteins / genetics
  • Repressor Proteins / metabolism
  • Sequence Homology, Amino Acid
  • Substrate Specificity
  • Transcription Factors / chemistry*
  • Transcription Factors / genetics
  • Transcription Factors / metabolism*


  • Bacterial Proteins
  • DNA-Binding Proteins
  • Escherichia coli Proteins
  • Lac Repressors
  • Ligands
  • PurR protein, Bacteria
  • PurR protein, E coli
  • Repressor Proteins
  • Transcription Factors
  • DNA