We have expanded the reference set of proteins used in SELCON3 by including 11 additional proteins (selected from the reference sets of Yang and co-workers and Keiderling and co-workers). Depending on the wavelength range and whether or not denatured proteins are included in the reference set, five reference sets were constructed with the number of reference proteins varying from 29 to 48. The performance of three popular methods for estimating protein secondary structure fractions from CD spectra (implemented in software packages CONTIN, SELCON3, and CDSSTR) and a variant of CONTIN, CONTIN/LL, that incorporates the variable selection method in the locally linearized model in CONTIN, were examined using the five reference sets described here, and a 22-protein reference set. Secondary structure assignments from DSSP were used in the analysis. The performances of all three methods were comparable, in spite of the differences in the algorithms used in the three software packages. While CDSSTR performed the best with a smaller reference set and larger wavelength range, and CONTIN/LL performed the best with a larger reference set and smaller wavelength range, the performances for individual secondary structures were mixed. Analyzing protein CD spectra using all three methods should improve the reliability of predicted secondary structural fractions. The three programs are provided in CDPro software package and have been modified for easier use with the different reference sets described in this paper. CDPro software is available at the website: http://lamar.colostate.edu/ approximately sreeram/CDPro.
Copyright 2000 Academic Press.