Background: Antigen presenting cells (APCs) sample the extra cellular space and present peptides from here to T helper cells, which can be activated if the peptides are of foreign origin. The peptides are presented on the surface of the cells in complex with major histocompatibility class II (MHC II) molecules. Identification of peptides that bind MHC II molecules is thus a key step in rational vaccine design and developing methods for accurate prediction of the peptide:MHC interactions play a central role in epitope discovery. The MHC class II binding groove is open at both ends making the correct alignment of a peptide in the binding groove a crucial part of identifying the core of an MHC class II binding motif. Here, we present a novel stabilization matrix alignment method, SMM-align, that allows for direct prediction of peptide:MHC binding affinities. The predictive performance of the method is validated on a large MHC class II benchmark data set covering 14 HLA-DR (human MHC) and three mouse H2-IA alleles.
Results: The predictive performance of the SMM-align method was demonstrated to be superior to that of the Gibbs sampler, TEPITOPE, SVRMHC, and MHCpred methods. Cross validation between peptide data set obtained from different sources demonstrated that direct incorporation of peptide length potentially results in over-fitting of the binding prediction method. Focusing on amino terminal peptide flanking residues (PFR), we demonstrate a consistent gain in predictive performance by favoring binding registers with a minimum PFR length of two amino acids. Visualizing the binding motif as obtained by the SMM-align and TEPITOPE methods highlights a series of fundamental discrepancies between the two predicted motifs. For the DRB1*1302 allele for instance, the TEPITOPE method favors basic amino acids at most anchor positions, whereas the SMM-align method identifies a preference for hydrophobic or neutral amino acids at the anchors.
Conclusion: The SMM-align method was shown to outperform other state of the art MHC class II prediction methods. The method predicts quantitative peptide:MHC binding affinity values, making it ideally suited for rational epitope discovery. The method has been trained and evaluated on the, to our knowledge, largest benchmark data set publicly available and covers the nine HLA-DR supertypes suggested as well as three mouse H2-IA allele. Both the peptide benchmark data set, and SMM-align prediction method (NetMHCII) are made publicly available.