Environmental sequence data of microbial communities now makes up the majority of public genomic information. The assignment of a function to sequences from these metagenomic sources is challenging because organisms associated with the data are often uncharacterized and not cultivable. To overcome these challenges, we created a rationally designed expression library of metagenomic proteins covering the sequence space of the thioredoxin superfamily. This library of 100 individual proteins represents more than 22,000 thioredoxins found in the Global Ocean Sampling data set. We screened this library for the functional rescue of Escherichia coli mutants lacking the thioredoxin-type reductase (ΔtrxA), isomerase (ΔdsbC), or oxidase (ΔdsbA). We were able to assign functions to more than a quarter of our representative proteins. The in vivo function of a given representative could not be predicted by phylogenetic relation but did correlate with the predicted isoelectric surface potential of the protein. Selected proteins were then purified, and we determined their activity using a standard insulin reduction assay and measured their redox potential. An unexpected gel shift of protein E5 during the redox potential determination revealed a redox cycle distinct from that of typical thioredoxin-superfamily oxidoreductases. Instead of the intramolecular disulfide bond formation typical for thioredoxins, this protein forms an intermolecular disulfide between the attacking cysteines of two separate subunits during its catalytic cycle. Our functional metagenomic approach proved not only useful to assign in vivo functions to representatives of thousands of proteins but also uncovered a novel reaction mechanism in a seemingly well-known protein superfamily.
Keywords: DsbA; DsbC; Escherichia coli (E. coli); TrxA; oxidase; protein disulfide isomerase; reductase; thiol; thiol-disulfide oxidoreductase; thioredoxin.
Copyright © 2021 The Authors. Published by Elsevier Inc. All rights reserved.