Expanding the nitrogen regulatory protein superfamily: Homology detection at below random sequence identity

Proteins. 2002 Jul 1;48(1):75-84. doi: 10.1002/prot.10110.

Abstract

Nitrogen regulatory (PII) proteins are signal transduction molecules involved in controlling nitrogen metabolism in prokaryots. PII proteins integrate the signals of intracellular nitrogen and carbon status into the control of enzymes involved in nitrogen assimilation. Using elaborate sequence similarity detection schemes, we show that five clusters of orthologs (COGs) and several small divergent protein groups belong to the PII superfamily and predict their structure to be a (betaalphabeta)(2) ferredoxin-like fold. Proteins from the newly emerged PII superfamily are present in all major phylogenetic lineages. The PII homologs are quite diverse, with below random (as low as 1%) pairwise sequence identities between some members of distant groups. Despite this sequence diversity, evidence suggests that the different subfamilies retain the PII trimeric structure important for ligand-binding site formation and maintain a conservation of conservations at residue positions important for PII function. Because most of the orthologous groups within the PII superfamily are composed entirely of hypothetical proteins, our remote homology-based structure prediction provides the only information about them. Analogous to structural genomics efforts, such prediction gives clues to the biological roles of these proteins and allows us to hypothesize about locations of functional sites on model structures or rationalize about available experimental information. For instance, conserved residues in one of the families map in close proximity to each other on PII structure, allowing for a possible metal-binding site in the proteins coded by the locus known to affect sensitivity to divalent metal ions. Presented analysis pushes the limits of sequence similarity searches and exemplifies one of the extreme cases of reliable sequence-based structure prediction. In conjunction with structural genomics efforts to shed light on protein function, our strategies make it possible to detect homology between highly diverse sequences and are aimed at understanding the most remote evolutionary connections in the protein world.

MeSH terms

  • Amino Acid Sequence
  • Animals
  • Bacterial Proteins*
  • Binding Sites
  • DNA-Binding Proteins / chemistry*
  • DNA-Binding Proteins / classification*
  • DNA-Binding Proteins / physiology
  • Evolution, Molecular
  • Hydrophobic and Hydrophilic Interactions
  • Ligands
  • Metals / chemistry
  • Models, Molecular
  • Molecular Sequence Data
  • Nitrogen / metabolism
  • PII Nitrogen Regulatory Proteins
  • Protein Folding
  • Protein Structure, Secondary
  • Sensitivity and Specificity
  • Sequence Alignment
  • Sequence Analysis, Protein / methods*
  • Sequence Homology, Amino Acid
  • Trans-Activators*
  • Transcription Factors*

Substances

  • Bacterial Proteins
  • DNA-Binding Proteins
  • Ligands
  • Metals
  • PII Nitrogen Regulatory Proteins
  • Trans-Activators
  • Transcription Factors
  • Nitrogen