An extension to standard protein secondary structure predictions using optical spectra that encompasses the number and average lengths of segments of uniform secondary structure in the sequence is demonstrated. The connectivity and numbers of segments can be described by a matrix descriptor [sij] (i, j representing segment types such as helix and beta-sheet strands). Independent knowledge of the fractional concentration of each secondary structure type and of the total number of residues in the protein then with [sij] yields the average segment length of each type. The physical background for prediction of this extended structural descriptor from spectral data is summarized, rules for its generation from reference X-ray structures are defined, and formal variants of its form are discussed. Using a novel neural network approach to analyze a training set of electronic circular dichroism (ECD) and vibrational circular dichroism (VCD) spectra for 23 proteins, matrix descriptors encompassing helix, sheet, and other forms are predicted. The results show that the matrix descriptor can be predicted to an accuracy comparable to that of conventionally predicted average fractional secondary structures. In this respect the ECD predictions of [sij] were significantly more accurate than the VCD ones, which may result from the longer range length dependence of the ECD bandshape and intensity. Summary results for a parallel analysis using Fourier transform infrared spectra indicate somewhat lower reliability than those for VCD.
Copyright 1999 Academic Press.