To find motifs that mediate helix-helix interactions in membrane proteins, we have analyzed frequently occurring combinations of residues in a database of transmembrane domains. Our analysis was performed with a novel formalism, which we call TMSTAT, for exactly calculating the expectancies of all pairs and triplets of residues in individual sequences, taking into account differential sequence composition and the substantial effect of finite length in short segments. We found that the number of significantly over and under-represented pairs and triplets was much greater than the random expectation. Isoleucine, glycine and valine were the most common residues in these extreme cases. The main theme observed is patterns of small residues (Gly, Ala and Ser) at i and i+4 found in association with large aliphatic residues (Ile, Val and Leu) at neighboring positions (i.e. i+/-1 and i+/-2). The most over-represented pair is formed by two glycine residues at i and i+4 (GxxxG, 31.6 % above expectation, p<1x10(-33)) and it is strongly associated with the neighboring beta-branched residues Ile and Val. In fact, the GxxxG pair has been described as part of the strong interaction motif in the glycophorin A transmembrane dimer, in which the pair is associated with two Val residues (GVxxGV). GxxxG is also the major motif identified using TOXCAT, an in vivo selection system for transmembrane oligomerization motifs. In conjunction with these experimental observations, our results highlight the importance of the GxxxG+beta-branched motif in transmembrane helix-helix interactions. In addition, the special role for the beta-branched residues Ile and Val suggested here is consistent with the hypothesis that residues with constrained rotameric freedom in helical conformation might reduce the entropic cost of folding in transmembrane proteins. Additional material is available at http://engelman.csb.yale. edu/tmstat and http://bioinfo.mbb.yale. edu/tmstat.
Copyright 2000 Academic Press.