Sequence--and structure-based searching strategies have proven useful in the identification of remote homologs and have facilitated both structural and functional predictions of many uncharacterized protein families. We implement these strategies to predict the structure of and to classify a previously uncharacterized cluster of orthologs (COG3019) in the thioredoxin-like fold superfamily. The results of each searching method indicate that thioltransferases are the closest structural family to COG3019. We substantiate this conclusion using the ab initio structure prediction method rosetta, which generates a thioredoxin-like fold similar to that of the glutaredoxin-like thioltransferase (NrdH) for a COG3019 target sequence. This structural model contains the thiol-redox functional motif CYS-X-X-CYS in close proximity to other absolutely conserved COG3019 residues, defining a novel thioredoxin-like active site that potentially binds metal ions. Finally, the rosetta-derived model structure assists us in assembling a global multiple-sequence alignment of COG3019 with two other thioredoxin-like fold families, the thioltransferases and the bacterial arsenate reductases (ArsC).
Published 2003 Wiley-Liss, Inc.