The nature of the nucleation-collapse mechanism in protein folding is probed using 27-mer and 36-mer lattice models. Three different forms for the interaction potentials are used. Three of the four 27-mer sequences have maximally compact and identical native state while the other has a non-compact native conformation. All the sequences fold thermodynamically and kinetically by a two-state process. Analysis of individual trajectories for each sequence using a self-organizing neural net algorithm shows that upon formation of a critical set of contacts the polypeptide chain rapidly reaches the native conformation which is consistent with a nucleation-collapse mechanism. The algorithm, which reduces the identification of the folding nucleus for each trajectory to one of pattern recognition, is used to show that there are multiple folding nuclei. There is a distribution of nucleation contacts in the transition states with some of them occurring with more probability (when averaged over the denatured ensemble) than others. We also show that there is a distribution in the size of the nuclei with the average number of residues in the folding nuclei being less than about one-third of the chain size. The fluctuations in the sizes of the nuclei are large, suggestive of a broad transition region. The folding nuclei, the structures of each are the corresponding transition states, have varying degree of overlap with the native conformation. The distribution of the radius of gyration of the transition states shows that these structures are an expanded form (by about 25% in the radius of gyration) of the native conformation. Local contacts are most dominant in the folding nuclei while a certain fraction of non-local contacts is necessary to stabilize the transition states. The search for the critical nuclei initially involves the formation of local contacts, while non-local contacts are formed later. The fractional values of PhiF for the two 27-mer mutants found by using the protein engineering protocol are consistent with the microscopic picture of partial formation of structures involving these residues in the transition state. These observations lead to a multiple folding nuclei (MFN) model for nucleation-collapse mechanism in protein folding. The major implication of the MFN model is that, even if the residues whose tertiary interactions are formed nearly completely in the transition state are mutated, it does not disrupt the nature of the nucleation-collapse mechanism. We analyze the experiments on chymotrypsin inhibitor 2 and alpha-spectrin SH3 domain and two circular permutants in light of the MFN model. It is shown that the PhiF-value analysis for these proteins gives considerable support to the MFN model. The theoretical and experimental studies give a coherent picture of the nucleation-collapse mechanism in which there is a distribution of folding nuclei with some more probable than others. The formation of any specific nucleus is not necessary for efficient two-state folding.
Copyright 1998 Academic Press.