Predicting HLA class II antigen presentation through integrated deep learning

Nat Biotechnol. 2019 Nov;37(11):1332-1343. doi: 10.1038/s41587-019-0280-2. Epub 2019 Oct 14.

Abstract

Accurate prediction of antigen presentation by human leukocyte antigen (HLA) class II molecules would be valuable for vaccine development and cancer immunotherapies. Current computational methods trained on in vitro binding data are limited by insufficient training data and algorithmic constraints. Here we describe MARIA (major histocompatibility complex analysis with recurrent integrated architecture; https://maria.stanford.edu/ ), a multimodal recurrent neural network for predicting the likelihood of antigen presentation from a gene of interest in the context of specific HLA class II alleles. In addition to in vitro binding measurements, MARIA is trained on peptide HLA ligand sequences identified by mass spectrometry, expression levels of antigen genes and protease cleavage signatures. Because it leverages these diverse training data and our improved machine learning framework, MARIA (area under the curve = 0.89-0.92) outperformed existing methods in validation datasets. Across independent cancer neoantigen studies, peptides with high MARIA scores are more likely to elicit strong CD4+ T cell responses. MARIA allows identification of immunogenic epitopes in diverse cancers and autoimmune disease.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Antigen Presentation
  • CD4-Positive T-Lymphocytes / immunology*
  • Computational Biology / methods*
  • Deep Learning
  • Histocompatibility Antigens Class II / chemistry
  • Histocompatibility Antigens Class II / genetics*
  • Humans
  • K562 Cells
  • Mass Spectrometry
  • Neural Networks, Computer
  • Peptides / metabolism
  • Sequence Analysis, RNA

Substances

  • Histocompatibility Antigens Class II
  • Peptides