Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2005 Aug 11;6:202.
doi: 10.1186/1471-2105-6-202.

PALSSE: A Program to Delineate Linear Secondary Structural Elements From Protein Structures

Affiliations
Free PMC article
Comparative Study

PALSSE: A Program to Delineate Linear Secondary Structural Elements From Protein Structures

Indraneel Majumdar et al. BMC Bioinformatics. .
Free PMC article

Abstract

Background: The majority of residues in protein structures are involved in the formation of alpha-helices and beta-strands. These distinctive secondary structure patterns can be used to represent a protein for visual inspection and in vector-based protein structure comparison. Success of such structural comparison methods depends crucially on the accurate identification and delineation of secondary structure elements.

Results: We have developed a method PALSSE (Predictive Assignment of Linear Secondary Structure Elements) that delineates secondary structure elements (SSEs) from protein Calpha coordinates and specifically addresses the requirements of vector-based protein similarity searches. Our program identifies two types of secondary structures: helix and beta-strand, typically those that can be well approximated by vectors. In contrast to traditional secondary structure algorithms, which identify a secondary structure state for every residue in a protein chain, our program attributes residues to linear SSEs. Consecutive elements may overlap, thus allowing residues located at the overlapping region to have more than one secondary structure type.

Conclusion: PALSSE is predictive in nature and can assign about 80% of the protein chain to SSEs as compared to 53% by DSSP and 57% by P-SEA. Such a generous assignment ensures almost every residue is part of an element and is used in structural comparisons. Our results are in agreement with human judgment and DSSP. The method is robust to coordinate errors and can be used to define SSEs even in poorly refined and low-resolution structures. The program and results are available at http://prodata.swmed.edu/palsse/.

Figures

Figure 1
Figure 1
Ramachandran angles from helices and β-strands defined by DSSP and our program. The culled PDB set (described in methods) was used for this calculation. Figs. a and b show the Ramachandran angles obtained respectively from helices and β-strands defined by DSSP. Figs. c and d show the Ramachandran angles obtained respectively from helices and β-strands not defined by DSSP but defined by our program. φ and ψ angles for figs. a and b were obtained from DSSP output. φ and ψ angles for figs. c and d were calculated from output of our algorithm such that φ is torsion angle between residues i-1 and i, and ψ is torsion angle between residues i and i+1 where residues at positions i-1, i and i+1 are part of the same SSE. α, π and 310 helices were used for obtaining data shown in fig. a (DSSP definition 'H', 'I', 'G' respectively). β-Strands were used for obtaining data shown in fig. b (DSSP definition 'E'). Three regions of over-predicted points by our method are shown with an example from each region. Figs. e, f and g show stereo diagrams of parts of three helices respectively from "1b × 4" (chain A, residue 175 in red), "1iom" (chain A, residue 73 in cyan) and "1 × 7d" (chain A, residue 180 in magenta). φ and ψ angles from the residues under study are marked red, cyan and magenta in fig. c. The residues with φ and ψ points highlighted in fig. c are shown as spheres in the same colors in figs. e, f and g.
Figure 2
Figure 2
Secondary structure assignment reliability for DSSP, P-SEA and our program using randomly shifted PDB coordinates. The culled PDB set (described in methods) was used for this calculation. Gaussian random numbers were used to randomly shift coordinates of residues from 0.2Å to 2Å in steps of 0.2Å, in the PDB files. 100 files were generated for every file for every data-point leading to a total of 1,00,000 randomly shifted coordinate files. 2a: Mean and standard error of assignment consistency compared with assignment by the same program on the original coordinates. A percentage match was calculated by comparing definitions for the coordinate shifted file with the program output from actual file on a per residue basis. Means for the percentage match are shown. Standard errors were about 1% in each case (not shown). 2b: Average secondary structure content defined by each program for PDB files at different levels of perturbations are shown. The files used are the same as for fig. 8a. The number of residues assigned as helices or β-strands are shown as a percentage of total residues. Spaces and coils in the program output are counted for calculating percentages. 2c: Percentage of residues over-predicted by each program (DSSP, P-SEA, our method) with respect to the other two is shown. 100 files from the culled PDB set were used for these calculations. 35,670 residues were considered. Results shown are for over-predictions by program names in the column heads when compared with program names in the row heads. Actual number of helices and β-strands assigned by the program are shown on the diagonal (bracketed values).
Figure 3
Figure 3
Two examples of secondary structure assignment by different programs. We chose an averaged NMR structure "1ahk" and a low-resolution X-ray structure (3.0Å) "1fjg" to show as examples since over-prediction by our method is maximum for such structures. Only chain "J" of "1fjg" is shown. Figs. a, b, c show cartoon diagrams of "1ahk", prepared using MOLSCRIPT [46], for β-strand and α-helix definitions by our program, DSSP [8] and P-SEA [17] respectively. β-Strands are shown in yellow and helices in cyan. The N- and C- termini are marked for each structural diagram. The elements produced by our program are labeled. Fig. d shows the secondary structure assignment for "1ahk" by PALSSE (our program), DSSP, P-SEA, DEFINE_S, STRIDE, SSTRUC and PROSS. Our interpretation of β-strands and helices as defined by the different programs are colored in yellow and cyan respectively. The starting positions of each element labeled in fig. a are shown on the first line. The sequence is numbered on the second line with black letters denoting units, red denoting tenths and blue denoting the hundredths places. The protein sequence is shown in the third line. Figs. e, f, g shows cartoon diagrams of chain "J" of "1fjg", prepared using MOLSCRIPT [46], highlighting β-strand and helix definitions by our program, DSSP, and P-SEA respectively. β-Strands are in yellow and helices are shown in cyan. Elements are labeled in fig. e. Fig. h shows the secondary structure alignment for "J" chain of "1fjg". Definitions produced by the same programs as that used for fig. d are shown. Yellow color is used for our interpretation of β-strands and cyan denotes our interpretation of helices. Green has been used to denote overlaps between helix and β-strand elements defined by our program. The first, second, and third lines show start of each element in fig. e, residue number and sequence respectively, similar to fig d. DSSP, P-SEA, DEFINE_S, SSTRUC, STRIDE and PROSS assignments were generated by obtaining the programs, and then compiling and running them with default parameters on the example PDB files.
Figure 4
Figure 4
Parameters for assignment of helix and β-strand property to individual residues. Cα distance and Cα torsion angle were calculated for defining helices (fig. a) and β-strands (fig. b). The distance between residues i, i+3 (shown joined by a blue line) and torsion angle between residues i, i+1, i+2, i+3 (shown as angle between two colored planes; yellow plane between residues i, i+1, i+2 and orange plane between residues i+1, i+2, i+3) are used to assign loose-helix, strict-helix and loose-strand secondary structure property to individual residues. 4c: Distance between i, i+3 Cα residues from helix and β-strand definitions obtained from DSSP [8] output. Distances were binned in 0.2 Å intervals. Cutoff distance c1 (8.1 Å) is the maximum distance allowed for assigning loose-helix property to a residue. Cutoff c1 is also the minimum distance allowed for assigning loose-strand property to a residue. Residues at i and i+3 positions are allowed in the same SSE template (SSET) only if the cutoff distance c1 passes. Cutoff c2 (6.4 Å) is the maximum i, i+3 Cα distance for strict helix definition. 4d: Torsion angle between i, i+1, i+2, i+3 Cα atoms for helix and β-strand definitions obtained from DSSP output. Angles are binned in 5° intervals. A loose-helix definition is assigned to a residue only if the torsion angle for the residue falls between c1 (-35°) and c2 (115°). A loose-strand is assigned only if the torsion angle is -180° to c1 or c2 to 180°. c3 is the optimal torsion angle for helices and is used to define strict-helix residues if the torsion angle is within a 2 sigma deviation from c3.
Figure 5
Figure 5
The three parameters on which quadruplets scoring is based. In figs. a, d, g, a quadruplet is formed from residues i, i+1, j-1 and j. The score is used to select the best quadruplets to join and form β-sheets. Each scoring parameter has been chosen such that they least influence each other. Residues i-1, i+2, j+1 and j-2 are required to calculate angles for quadruplet scoring. The first parameter is Cα-Cα distance between paired residues (fig. a). Blue lines joining i, j and i+1, j-1 show the distance being scored. This parameter approximates the deviation of the triangle apex i with reference to the triangle apex j in fig. b due to rotation of the plane i-1, i, i+1 on the X axis. Fig. c shows the Cα-Cα distances for parallel and antiparallel β-strands obtained from DSSP [8] output. Data is binned at 0.1 Å intervals and fit to a normal distribution using "gnuplot" [40]. Distribution for parallel β-strands has a mean at c1 (4.81 Å) with a sigma of 0.22. Distance for antiparallel β-strands follows a bi-modal distribution with means (μ) at c2 (4.46 Å) and c3(5.24 Å) and a standard deviation (σ) of 0.26. These μ and σ values were used to calculate the probability of occurrence of Cα-Cα pairing distances while scoring quadruplets by our algorithm. A Cα-Cα maximum distance of 7.5 Å (not shown) was used to limit pairing between residues. The second parameter is angle between lines (shown in blue) joining the vertices i, j and the base j-1, j+1 of the imaginary triangles j-1, j, j+1 and i+1, i, i-1 (fig. d). Only one of the four cases is shown. The other angles are between lines j, i and i+1, i-1; j-1, i+1 and i, i+2; i+1, j-1 and j, j-2. Deviation of this angle approximates the deviation of the triangle apex i-1 with reference to the triangle apex j+1 in fig. b due to rotation of the plane i-1, i, i+1 on the Y axis. Fig. e shows the distribution of angles, binned at 5° intervals, obtained from parallel and antiparallel β-strands defined by DSSP where c1 (87°) and c2 (82.2°) are the respective means. Fig. f shows the probability of obtaining a parameter-2 angle at different multipliers of the standard deviation for data shown in fig. e. The probability obtained is used for scoring quadruplets. The third parameter is a torsion angle (fig. g) between the points j, mj, mi, i. mj is the midpoint between j+i, j-1. mi is the midpoint between i-1, i+1. Lines joining residues and the midpoints are shown in blue. A similar torsion angle involving residues j-1, i+1 as end points and midpoints between j, j-2 and i, i+2 is computed (not shown). Deviation of the torsion angle approximates the deviation of vertex i in fig. b with respect to vertex j due to rotation of the plane i-1, i, i+1 on the Z axis. Fig. h shows the distribution of torsion angles (binned at 5° intervals) obtained from DSSP output where c1 (-20.9) and c2 (-27.9) are the respective means for data from parallel and antiparallel β-strands. Fig. i shows the probability of obtaining a torsion angle at different multipliers of the standard deviation for the data in fig. h.
Figure 6
Figure 6
True and false quadruplets generated from DSSP-defined β-strands. 6a, 6b: Residues i, j and i+1, j-1 shown paired with green broken lines form the true quadruplet. For every such quadruplet four false quadruplets (shown with red and blue broken lines) are possible: j, i+1, i+2, j-1 and j-1, i, i+1, j-2 in fig. a; i, j-1, j, i-1 and i, i+1, j, j+1 in fig. b. These quadruplets were scored to find the difference in scores between true and false quadruplets. True and false scores for quadruplets generated from DSSP output for parallel β-strand coordinates (fig. c) and antiparallel β-strand coordinates (fig. d). Cutoff c1 (45) and c2 (46) in fig. c are the scores of the best false quadruplet and the worst correct quadruplet respectively for parallel β-strand data. Cutoff c1 (39) and c2 (44) are the scores of the best false quadruplet and worst correct quadruplet respectively from antiparallel β-strand data. Cutoffs c1 and c2 are used as cutoffs to differentiate between grade 1 and grade 2 quadruplets in our algorithm.
Figure 7
Figure 7
Initiation and extension of ladders of paired residues using quadruplets. 7a: Ladders of paired residues are initiated and extended using quadruplets. The initiation quadruplets i+2, i+3, j+5, j+6 and i+6, i+7, j+1, j+2 are shown with green pairing. Quadruplets are attached on either side to extend the arms of the ladder. Addition of quadruplet i+4, i+5, j+3, j+4 joins the two ladders i, i+4, j+4, j+8 and i+5, i+8, j, j+3 to form the complete unit. Depending on the position of the best quadruplets any number of quadruplets might be responsible for seeding a ladder. Smaller ladder fragments get joined by worse scoring quadruplets. 7b: Residue pairing angle between residues on three β-strands (residue i+1, j-1, k-1 in fig. c) from DSSP output. Cutoff c1 (70°) is close to the largest angle observed. This was used to check new residue pairings formed while adding quadruplets. 7c: Checks performed during quadruplet addition and ladder extension. Quadruplet k, k-1, j-1, j and i, i+i, j-1, j share the common residues j-1 and j. Pairing and angle between pairs are checked for residues j-1 and j when worse scoring quadruplets are added. Quadruplet k, k-1, j-1, j scores better than i, i+1, j-1, j. While adding i, i+1, j-1, j it was found that the angle i, j, k fails the cutoff of 70° (fig. b). Quadruplet i, i+1, j-1, j is not added. Insertion of bulge residues is handled during joining of quadruplets. Quadruplets i+1, i+2, j-2, j-1 and i+2, i+3, j-3, j-2 share the common residues i+2, j-2. The quadruplets are simply added end to end. However, quadruplets j, j-1, k-1, k and j-2, j-3, k-2, k-1 (pairing between j-2, k-1 not shown) share only a single residue k-1. As j-2, j-3, k-2, k-1 scores worse than j, j-1, k-1, k, the pairing between j-1, k-1 is retained and residue j-2 becomes a bulge with respect to residue k-1.
Figure 8
Figure 8
Helix endpoints redefined based on RMSD and angle between their axial vectors. 8a: Vectors representing a short opened up helix by two different methods. The red arrow shows the axis obtained by using the largest spread of the Cα atoms (vector corresponding to the largest eigenvalue). The green arrow shows the rotational axis obtained when the helix that is shifted by one residue, is aligned to the original helix. The first method is unsuitable for representing this helix and does not work for π and short helices. Our algorithm uses the rotational-fit method (described below, 8b) for all helices. RMSD of residues are calculated over this vector. Angles between vectors, calculated from residues of consecutive helices, are used to determine whether to break them so as to appropriately define the helices as linear elements. 8b: Helix RMSD data calculated using the rotational fit vector. Average RMSD of unbroken helices from our algorithm varies widely. The helices were broken multiple times and the angle of break was analyzed (data not shown). The mode of the angle of break (22°) for long (>15 residue) helices was used to determine the break point of consecutive helices. Helices that break at >22° were chosen for the dataset for calculation of RMSD and angle of break (fig. c). Average RMSD of broken helices is shown in this figure. A line was fitted using "gnuplot" [40] to approximate the RMSD of broken helices. A Z-score of 2.5 is used to limit breaking helices that deviate less than 2.5 times sigma around the approximated RMSD value for broken helices at a particular helix length. 8c: Angle of helix break calculated from dataset of helices used in fig. b. Data were collected from helices broken once, twice and thrice. The normalized data are shown. Helices that show an angle greater than c1 (20°) between broken parts are split. 8d: Helix split by our algorithm. All possibilities of broken pieces are assessed with respect to the RMSD of the pieces and angle of break. Helices i, i+5 and i+4, i+15 are finally chosen as correctly broken. Helices i, i+8 and i+15; and i, i+11 and i+10, i+15 are also possibilities that are analyzed but not chosen as the optimum break. Residues i+4, i+5 are shared by the two helix pieces (Cα shown as spheres).
Figure 9
Figure 9
β-strands redefined to obtain linear elements using different methods. 9a: Strands broken based on i, i+3 Cα distance (fig. 4a). Distance between i-1, i+2 residues and i+13, i+16 residues fail the cutoff distance of 8.1 Å. The residues i, i+1 (shown in red) are shared by β-strands i-9, i+1 and i, 1+7. Residues i+15, i+14 (shown in red) are shared by the β-strands i+24, i+14 and i+15, i+8. 9b: Angles for β-strand breaking while accounting for bulges. Angles were calculated from all β-strands defined by our algorithm before the β-strand-breaking step. The angle between i-2, i, i+2 Cα atoms is used to determine if the β-strand is bent. An average pseudo-point (pp) was generated from the j, j+1, j+2, j+3 atoms and the angle between j-1, pp, j+4 was found. β-strands were broken when i-2, i, i+2 angle was greater than c1 (45°) and j-1, pp, j+4 angle was greater than c2 (70°). j = i-1 showed the best correlation between the two angles (data not shown). 9c: Strand breaking using pseudo-point to find distorted regions. Residues i-1, i, i+1, i+2 (shown in red) are used to generate an average point. Angle between i-2, the average point and i+3 locates a distorted region if the cutoff angle of 70° fails. The β-strand is broken if the i-2, i, i+2 angle also fails at the same location. The β-strand i-3, i+4 is split to generate two β-strands i-3, i+1 and i, i+4. 9d: Strand breaking using pairing information between neighboring β-strands. Residue i (shown in red) is paired to residue j on one side and to residue k on the other. Residue i+1 is paired to residue j-1 however residue i-1 is not paired to β-strand j. Also, residue i-1 is paired to residue k+1 but residue i+1 is not paired to β-strand k. Lack of a pair of common residues pairing between β-strand j and k splits the sheet, with residue i shared between both sheets.

Similar articles

See all similar articles

Cited by 17 articles

See all "Cited by" articles

References

    1. Pauling L, Corey RB. Configurations of polypeptide chains with favoured orientations around single bonds: two new pleated sheets. Proc Natl Acad Sci U S A. 1951;37:729–740. - PMC - PubMed
    1. Donohue J. Hydrogen bonded helical configurations of the polypeptide chain. Proc Natl Acad Sci U S A. 1953;39:470–478. - PMC - PubMed
    1. Low BW, Grenville-Wells HJ. Generalized mathematical relationships for polypeptide chain helices. The coordinates of the pi helix. Proc Natl Acad Sci U S A. 1953;39:785–802. - PMC - PubMed
    1. Venkatachalam CM. Stereochemical criteria for polypeptides and proteins. V. Conformation of a system of three linked peptide units. Biopolymers. 1968;6:1425–1436. doi: 10.1002/bip.1968.360061006. - DOI - PubMed
    1. Rose GD, Gierasch LM, Smith JA. Turns in peptides and proteins. Adv Protein Chem. 1985;37:1–109. - PubMed

Publication types

LinkOut - more resources

Feedback