In comparison with intense investigation of the structural determinants of protein folding rates, the sequence features favoring fast folding have received little attention. Here, we investigate this subject using simple models of protein folding and a statistical analysis of the Protein Data Bank (PDB). The mean-field model by Plotkin and coworkers predicts that the folding rate is accelerated by stronger-than-average interactions at short distance along the sequence. We confirmed this prediction using the Finkelstein model of protein folding, which accounts for realistic features of polymer entropy. We then tested this prediction on the PDB. We found that native interactions are strongest at contact range l = 8. However, since short range contacts tend to be exposed and they are frequently formed in misfolded structures, selection for folding stability tends to make them less attractive, that is, stability and kinetics may have contrasting requirements. Using a recently proposed model, we predicted the relationship between contact range and contact energy based on buriedness and contact frequency. Deviations from this prediction induce a positive correlation between contact range and contact energy, that is, short range contacts are stronger than expected, for 2/3 of the proteins. This correlation increases with the absolute contact order (ACO), as expected if proteins that tend to fold slowly due to large ACO are subject to stronger selection for sequence features favoring fast folding. Our results suggest that the selective pressure for fast folding is detectable only for one third of the proteins in the PDB, in particular those with large contact order.
Copyright © 2012 Wiley Periodicals, Inc.