Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Feb 15;196(4):1480-7.
doi: 10.4049/jimmunol.1501721. Epub 2016 Jan 18.

The Length Distribution of Class I-Restricted T Cell Epitopes Is Determined by Both Peptide Supply and MHC Allele-Specific Binding Preference

Free PMC article

The Length Distribution of Class I-Restricted T Cell Epitopes Is Determined by Both Peptide Supply and MHC Allele-Specific Binding Preference

Thomas Trolle et al. J Immunol. .
Free PMC article


HLA class I-binding predictions are widely used to identify candidate peptide targets of human CD8(+) T cell responses. Many such approaches focus exclusively on a limited range of peptide lengths, typically 9 aa and sometimes 9-10 aa, despite multiple examples of dominant epitopes of other lengths. In this study, we examined whether epitope predictions can be improved by incorporating the natural length distribution of HLA class I ligands. We found that, although different HLA alleles have diverse length-binding preferences, the length profiles of ligands that are naturally presented by these alleles are much more homogeneous. We hypothesized that this is due to a defined length profile of peptides available for HLA binding in the endoplasmic reticulum. Based on this, we created a model of HLA allele-specific ligand length profiles and demonstrate how this model, in combination with HLA-binding predictions, greatly improves comprehensive identification of CD8(+) T cell epitopes.


Figure 1
Figure 1. Peptide binding length preference for five common HLA alleles
The length preference for each HLA was determined by measuring the binding affinity of a series of fixed C-terminal combinatorial libraries of different length. Three series were tested, with either I, K or F at the C-terminal. The series with the strongest binding affinity was selected to represent the HLA allele. The selected series is denoted in the parentheses in the legend. IC50 binding affinities for each length were calculated as geometric means of 3-6 experiments. The relative binding affinities plotted were calculated as IC50(9)/IC50(L) where L is the peptide length. Error bars indicate standard errors of the geometric means.
Figure 2
Figure 2. Length profiles of naturally presented peptides for five HLA molecules
Large datasets of HLA-I ligands were determined by the elution of ligands from secreted HLAs followed by mass spectrometry identification of the peptide sequences. From these ligand datasets, the number of ligands of each length was totaled. The y-axis indicates the number of ligands identified for a given length normalized by the number of peptides identified for the HLA at length 9.
Figure 3
Figure 3. Model fit of the available peptide length profile
The available peptide length profile was fitted using MHC ligand length profiles and HLA binding length preferences for HLA-A*01:01, HLA-A*02:01, HLA-A*24:02, HLA-B*07:02 and HLA-B*51:01 as described in the Materials and Methods. The optimal value for β associated with the fit was 0.30.
Figure 4
Figure 4. Predicted vs. measured ligand length profiles for five HLA molecules
A leave-one-out training was carried out by removing an HLA from the training dataset and then fitting the available peptide length profile with the remaining four HLAs. The resulting available peptide length profile was used in conjunction with the removed HLA’s binding length preference (Fig. 1) to predict the removed HLA’s ligand length profile. This predicted length profile was then compared to the measured ligand length profile of the removed HLA. As an example, in the HLA-A*01:01 plot, HLA-A*01:01 data was not used to fit an available peptide length profile (not shown). This available peptide length profile was then combined with the HLA-A*01:01 binding length preference to determine the predicted ligand length profile (blue line). This profile was compared to the measured HLA-A*01:01 ligand length profile (red line).
Figure 5
Figure 5. Benchmarks of T cell epitope and MHC-I ligand predictions
For each benchmark dataset, source proteins for each of the epitopes/ligands were downloaded and split into overlapping peptides of various lengths. The lengths of the overlapping peptides were determined by the lengths of the epitopes/ligands in the benchmark datasets; 8-13mer overlapping peptides for the IEDB, Marcilla and Thommen datasets, and 8-11mers for the Granados dataset. For each dataset, three sorted peptide lists were created using the following approaches: 1) predict affinities for all overlapping peptides and rank them based on their predicted IC50 value without taking length into account, 2) predict affinities for all 9mer peptides and rank them based on their predicted IC50 values (peptides of other lengths are considered non-candidates), 3) predict length corrected binding affinities for all overlapping peptides using the novel method described here and rank the peptides based on length corrected predictions. The plots show the number of epitopes/ligand identified by each approach as a function of the number of peptides tested, had the peptides been selected using the sorted lists described above.

Similar articles

See all similar articles

Cited by 34 articles

See all "Cited by" articles

Publication types

MeSH terms

LinkOut - more resources