Sequence-based prediction of SARS-CoV-2 vaccine targets using a mass spectrometry-based bioinformatics predictor identifies immunogenic T cell epitopes

Genome Med. 2020 Aug 13;12(1):70. doi: 10.1186/s13073-020-00767-w.


Background: The ongoing COVID-19 pandemic has created an urgency to identify novel vaccine targets for protective immunity against SARS-CoV-2. Early reports identify protective roles for both humoral and cell-mediated immunity for SARS-CoV-2.

Methods: We leveraged our bioinformatics binding prediction tools for human leukocyte antigen (HLA)-I and HLA-II alleles that were developed using mass spectrometry-based profiling of individual HLA-I and HLA-II alleles to predict peptide binding to diverse allele sets. We applied these binding predictors to viral genomes from the Coronaviridae family and specifically focused on T cell epitopes from SARS-CoV-2 proteins. We assayed a subset of these epitopes in a T cell induction assay for their ability to elicit CD8+ T cell responses.

Results: We first validated HLA-I and HLA-II predictions on Coronaviridae family epitopes deposited in the Virus Pathogen Database and Analysis Resource (ViPR) database. We then utilized our HLA-I and HLA-II predictors to identify 11,897 HLA-I and 8046 HLA-II candidate peptides which were highly ranked for binding across 13 open reading frames (ORFs) of SARS-CoV-2. These peptides are predicted to provide over 99% allele coverage for the US, European, and Asian populations. From our SARS-CoV-2-predicted peptide-HLA-I allele pairs, 374 pairs identically matched what was previously reported in the ViPR database, originating from other coronaviruses with identical sequences. Of these pairs, 333 (89%) had a positive HLA binding assay result, reinforcing the validity of our predictions. We then demonstrated that a subset of these highly predicted epitopes were immunogenic based on their recognition by specific CD8+ T cells in healthy human donor peripheral blood mononuclear cells (PBMCs). Finally, we characterized the expression of SARS-CoV-2 proteins in virally infected cells to prioritize those which could be potential targets for T cell immunity.

Conclusions: Using our bioinformatics platform, we identify multiple putative epitopes that are potential targets for CD4+ and CD8+ T cells, whose HLA binding properties cover nearly the entire population. We also confirm that our binding predictors can predict epitopes eliciting CD8+ T cell responses from multiple SARS-CoV-2 proteins. Protein expression and population HLA allele coverage, combined with the ability to identify T cell epitopes, should be considered in SARS-CoV-2 vaccine design strategies and immune monitoring.

Keywords: COVID-19; Computational biology; HLA-I binding prediction; HLA-II binding prediction; SARS-CoV-2 T cell epitopes; T cell assay; Vaccine design.

MeSH terms

  • Alleles
  • Antibody Affinity
  • COVID-19
  • COVID-19 Vaccines
  • Computational Biology
  • Coronavirus Infections / genetics
  • Coronavirus Infections / immunology*
  • Coronavirus Infections / prevention & control
  • Epitopes / chemistry
  • Epitopes / genetics
  • Epitopes / immunology*
  • Genome, Viral
  • HLA Antigens / chemistry
  • HLA Antigens / genetics
  • HLA Antigens / immunology*
  • Humans
  • Immunogenicity, Vaccine
  • Mass Spectrometry
  • Pandemics
  • Pneumonia, Viral / immunology*
  • T-Lymphocytes / immunology*
  • Viral Vaccines / chemistry
  • Viral Vaccines / genetics
  • Viral Vaccines / immunology*


  • COVID-19 Vaccines
  • Epitopes
  • HLA Antigens
  • Viral Vaccines