Machine learning methods and harmonized datasets improve immunogenic neoantigen prediction

Immunity. 2023 Nov 14;56(11):2650-2663.e6. doi: 10.1016/j.immuni.2023.09.002. Epub 2023 Oct 9.


The accurate selection of neoantigens that bind to class I human leukocyte antigen (HLA) and are recognized by autologous T cells is a crucial step in many cancer immunotherapy pipelines. We reprocessed whole-exome sequencing and RNA sequencing (RNA-seq) data from 120 cancer patients from two external large-scale neoantigen immunogenicity screening assays combined with an in-house dataset of 11 patients and identified 46,017 somatic single-nucleotide variant mutations and 1,781,445 neo-peptides, of which 212 mutations and 178 neo-peptides were immunogenic. Beyond features commonly used for neoantigen prioritization, factors such as the location of neo-peptides within protein HLA presentation hotspots, binding promiscuity, and the role of the mutated gene in oncogenicity were predictive for immunogenicity. The classifiers accurately predicted neoantigen immunogenicity across datasets and improved their ranking by up to 30%. Besides insights into machine learning methods for neoantigen ranking, we have provided homogenized datasets valuable for developing and benchmarking companion algorithms for neoantigen-based immunotherapies.

Keywords: cancer immunotherapy; machine learning; neoantigen prioritization; personalized cancer vaccine.

MeSH terms

  • Antigens, Neoplasm* / genetics
  • Histocompatibility Antigens Class I
  • Humans
  • Immunotherapy / methods
  • Machine Learning
  • Neoplasms* / genetics
  • Neoplasms* / therapy
  • Peptides


  • Antigens, Neoplasm
  • Histocompatibility Antigens Class I
  • Peptides