Circular Dichroism (CD) spectroscopy is a long-established technique for studying protein secondary structures in solution. Empirical analyses of CD data rely on the availability of reference datasets comprised of far-UV CD spectra of proteins whose crystal structures have been determined. This article reports on the creation of a new reference dataset which effectively covers both secondary structure and fold space, and uses the higher information content available in synchrotron radiation circular dichroism (SRCD) spectra to more accurately predict secondary structure than has been possible with existing reference datasets. It also examines the effects of wavelength range, structural redundancy and different means of categorizing secondary structures on the accuracy of the analyses. In addition, it describes a novel use of hierarchical cluster analyses to identify protein relatedness based on spectral properties alone. The databases are shown to be applicable in both conventional CD and SRCD spectroscopic analyses of proteins. Hence, by combining new bioinformatics and biophysical methods, a database has been produced that should have wide applicability as a tool for structural molecular biology.