Globular proteins typically fold into tightly packed arrays of regular secondary structures. We developed a model to approximate the compact parallel and antiparallel arrangement of α-helices and β-strands, enumerated all possible topologies formed by up to five secondary structural elements (SSEs), searched for their occurrence in spatial structures of proteins, and documented their frequencies of occurrence in the PDB. The enumeration model grows larger super-secondary structure patterns (SSPs) by combining pairs of smaller patterns, a process that approximates a potential path of protein fold evolution. The most prevalent SSPs are typically present in superfolds such as the Rossmann-like fold, the ferredoxin-like fold, and the Greek key motif, whereas the less frequent SSPs often possess uncommon structure features such as split β-sheets, left-handed connections, and crossing loops. This complete SSP enumeration model, for the first time, allows us to investigate which theoretically possible SSPs are not observed in available protein structures. All SSPs with up to four SSEs occurred in proteins. However, among the SSPs with five SSEs, approximately 20% (218) are absent from existing folds. Of these unobserved SSPs, 80% contain two or more uncommon structure features. To facilitate future efforts in protein structure classification, engineering, and design, we provide the resulting patterns and their frequency of occurrence in proteins at: http://prodata.swmed.edu/ssps/.
Keywords: fold; helix; secondary structure elements; strand; super-secondary structure pattern.
Copyright © 2016. Published by Elsevier Ltd.