Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Mar 12;8(1):4355.
doi: 10.1038/s41598-018-22592-3.

Proteome-wide Mapping of Immune Features Onto Plasmodium Protein Three-Dimensional Structures

Affiliations
Free PMC article

Proteome-wide Mapping of Immune Features Onto Plasmodium Protein Three-Dimensional Structures

Andrew J Guy et al. Sci Rep. .
Free PMC article

Abstract

Humoral immune responses against the malaria parasite are an important component of a protective immune response. Antibodies are often directed towards conformational epitopes, and the native structure of the antigenic region is usually critical for antibody recognition. We examined the structural features of various Plasmodium antigens that may impact on epitope location, by performing a comprehensive analysis of known and modelled structures from P. falciparum. Examining the location of known polymorphisms over all available structures, we observed a strong propensity for polymorphic residues to be exposed on the surface and to occur in particular secondary structure segments such as hydrogen-bonded turns. We also utilised established prediction algorithms for B-cell epitopes and MHC class II binding peptides, examining predicted epitopes in relation to known polymorphic sites within structured regions. Finally, we used the available structures to examine polymorphic hotspots and Tajima's D values using a spatial averaging approach. We identified a region of PfAMA1 involving both domains II and III under a high degree of balancing selection relative to the rest of the protein. In summary, we developed general methods for examining how sequence-based features relate to one another in three-dimensional space and applied these methods to key P. falciparum antigens.

Conflict of interest statement

The authors declare no competing interests.

Figures

Figure 1
Figure 1
Overview of the structural mapping with spatial averaging approach as used in the Python BioStructMap package. For any given residue within a PDB structure, all residues within a specified radius of the given residues are identified. The location of these residues within a given reference sequence is then found, with the assumption that user-provided data will be aligned to this reference sequence. Using selected residues and the corresponding subset of user-provided data, a function is called, returning (usually) a numerical value. For example, this function may return the mean of the respective data. Note that the provided data and the mapping function may take a diverse number of forms. This includes functions which apply some statistical test over a multiple-sequence alignment of genetic sequences (e.g. Tajima’s D). In this case, the function would apply the statistical test over the subset of codons which code for the selected residues. The returned value is then assigned to the original residue in the PDB structure. This process is repeated for all residues within the PDB structure. Results can be viewed as a heatmap displayed over the PDB structure.
Figure 2
Figure 2
Comparison of unique protein structures which match to Plasmodium sequences with >90% identity. Euler diagrams show the number of PDB structures which match to single/multiple/all species. For clarity, only four representative combinations of species are shown. Matching PDB structures were identified using a BLAST search against the PDB database, with an e-value cutoff of 10.0. A BLAST identity score cutoff of 90% was used for included matches. Redundancy in PDB structures was removed using a Sequence Identity Cutoff of 90% to group similar structures using precomputed sequence identity clusters available on the RCSB PDB database (http://www.rcsb.org/pdb/). Only a single representative structure from each group of redundant structures was counted when generating Euler diagrams.
Figure 3
Figure 3
Polymorphic residues within known P. falciparum structures are predominantly surface exposed. Relative solvent accessibility (RSA) is shown for residues with and without identified polymorphisms. RSA represents the proportional surface area of a residue that is exposed to solvent, relative to the maximum possible exposure for that amino acid. RSA was calculated using the maximum accessible surface area (ASA) values from Rost & Sanders. Box-and-whisker plots show median (red line) and interquartile range (box) of residue RSA values for each group. Violin plots show the smoothed distribution of RSA values for each group (violin plots employ a Kernel Density Estimation to compute an empirical probability distribution for each group). Polymorphic residues shown are both those with underlying non-synonymous SNPs regardless of allele frequency (n = 204), and those with underlying non-synonymous SNP with a minor allele frequency (MAF) ≥ 5% (n = 105). The majority of residues in the dataset did not have underlying polymorphisms (n = 28,869). Sequence polymorphisms were obtained from 65 Gambian isolates from Amambua-Ngwa et al., accessed via PlasmoDB. Polymorphic residues had significantly higher RSA values than the background RSA levels (p < 0.0001, Mann-Whitney U test), and polymorphic residues with a MAF ≥ 5% had significantly higher RSA than all polymorphic residues (p = 0.04, Mann-Whitney test).
Figure 4
Figure 4
Location of immunologically relevant features mapped onto a PfAMA1 structural model. Each panel shows the front, back and top view of the modelled PfAMA1 structure. (a) Polymorphic residues with an underlying minor allele frequency (MAF) greater than 5% are shown colored according to location within domain I (blue), domain II (magenta) or domain III (orange). Sequence polymorphisms were obtained from 65 Gambian isolates. (b) Spatial averaging of polymorphic residues highlights polymorphic hotspots. The proportion of polymorphic residues within 15 Å is shown for each central residue, with polymorphic residues defined as those with a MAF ≥ 5%. (c) Bepipred 2.0 predictions are shown over the PfAMA1 structure, with epitopes shown for two Bepipred thresholds—predicted epitopes are shown in yellow for a threshold of 0.5 (specificity = 0.57, sensitivity = 0.59) and in dark orange for a threshold of 0.55 (specificity = 0.81, sensitivity = 0.29). (d,e) The location of predicted MHC class II binding peptides are shown for the HLA-DPA1*02:01-DPB1*01:01 (d) and HLA-DQA1*05:01-DQB1*03:01 (e) alleles. Residues involved in a low binding peptide (50 nM < IC50 < 500 nM) are shown in light blue, while residues involved in a high binding peptide (IC50 < 50 nM) residue are shown in orange. Only the core binding region of each peptide binder is indicated on each structure.
Figure 5
Figure 5
Location of immunologically relevant features mapped onto an EBA-175 RII homology model. Each panel shows the front and back view of the modelled EBA-175 structure. The homology model was modelled from the 1ZRO PDB structure using ModPipe. (a) Polymorphic residues with an underlying minor allele frequency (MAF) greater than 5% are shown colored according to location within Region I (blue) and Region II (magenta). Sequence polymorphisms were obtained from 65 Gambian isolates. (b) Spatial averaging of polymorphic residues highlights polymorphic hotspots. The proportion of polymorphic residues within 15 Å is shown for each central residue, with polymorphic residues defined as those with a MAF ≥ 5%. (c) Bepipred 2.0 predictions are shown over the EBA-175 structure, with epitopes shown for two Bepipred thresholds: predicted epitopes are shown in yellow for a threshold of 0.5 (specificity = 0.57, sensitivity = 0.59) and in dark orange for a threshold of 0.55 (specificity = 0.81, sensitivity = 0.29). (d,e) The location of predicted MHC class II binding peptides are shown for the HLA-DPA1*02:01-DPB1*01:01 (d) and HLA-DQA1*05:01-DQB1*03:01 (e) alleles. Residues involved in a low binding peptide (50 nM < IC50 < 500 nM) are shown in light blue, while residues involved in a high binding peptide (IC50 < 50 nM) residue are shown in orange. Only the core binding region of each peptide binder is indicated on each structure.
Figure 6
Figure 6
Calculation of Tajima’s D for PfAMA1 and EBA-175, both with and without incorporation of protein structural information. (a, b) Spatial information incorporated into a calculation of Tajima’s D using modelled protein structures for AMA1 (a) and EBA-175 RII (b). Tajima’s D values for each residue were calculated using only those codons which were mapped to residues within a 15 Å radius of the central residue. (c,d) Tajima’s D was calculated over a sliding window of 102 bp and a step size of 3 bp, without incorporation of protein structural information. Tajima’s D values for central codons are displayed on the modelled protein structures for AMA1 (c) and EBA-175 RII (d). Data on sequence polymorphisms was obtained from PlasmoDB using sequences from 65 Gambian isolates. The structural model for PfAMA-1 was manually modelled, previously published in Arnott et al., and covers domains I-III of PfAMA-1. The structural model for EBA175 RII was created using Modpipe, with the PDB structure 1ZRO used as a template. Structures are colored according to the calculated value of Tajima’s D mapped to each residue, with residues without a defined Tajima’s D value shown in white.
Figure 7
Figure 7
Four discontinuous stretches of sequence make up a region of PfAMA1 with high Tajima’s D values as calculated using spatial mapping. (a) Detailed view of a region of PfAMA1 with high Tajima’s D values as calculated using spatial averaging. The protein structure is colored according to the color scale presented in Fig. 6, with red residues corresponding to the highest Tajima’s D values. Without the use of spatial averaging, there is a maximum Tajima’s D value of 1.84 within this region, whereas a maximum value of 2.39 is observed when incorporating spatial information. (b) Four discontinuous regions of sequence contribute to the set of surface exposed residues with highest Tajima’s D values. These four regions are shown in yellow (P303 - G313; DII), green (L419 - I426; DII), blue (V437 - I454: DII/III) and orange (D483 - F505; DIII).

Similar articles

See all similar articles

Cited by 3 articles

References

    1. World Health Organization. World Malaria Report 2016. (2016).
    1. Dai G, Carmicle S, Steede NK, Landry SJ. Structural basis for helper T-cell and antibody epitope immunodominance in bacteriophage T4 Hsp10. Role of disordered loops. J. Biol. Chem. 2002;277:161–168. doi: 10.1074/jbc.M102259200. - DOI - PubMed
    1. Dunker AK, et al. Intrinsically disordered protein. J. Mol. Graph. Model. 2001;19:26–59. doi: 10.1016/S1093-3263(00)00138-8. - DOI - PubMed
    1. Oldfield CJ, Dunker AK. Intrinsically disordered proteins and intrinsically disordered protein regions. Annu. Rev. Biochem. 2014;83:553–584. doi: 10.1146/annurev-biochem-072711-164947. - DOI - PubMed
    1. Tompa P. Intrinsically disordered proteins: a 10-year recap. Trends Biochem. Sci. 2012;37:509–516. doi: 10.1016/j.tibs.2012.08.004. - DOI - PubMed

Publication types

MeSH terms

Feedback