Cancer cell lines are a tremendous resource for cancer biology and therapy development. These multipurpose tools are commonly used to examine the genetic origin of cancers, to identify potential novel tumor targets, such as tumor antigens for vaccine devel-opment, and utilized to screen potential therapies in preclinical studies. Mutations, gene expression, and drug sensitivity have been determined for many cell lines using next-generation sequencing (NGS). However, the human leukocyte antigen (HLA) type and HLA expression of tumor cell lines, characterizations necessary for the development of cancer vaccines, have remained largely incomplete and, such information, when available, has been distributed in many publications. Here, we determine the 4-digit HLA type and HLA expression of 167 cancer and 10 non-cancer cell lines from publically available RNA-Seq data. We use standard NGS RNA-Seq short reads from "whole transcriptome" sequencing, map reads to known HLA types, and statistically determine HLA type, heterozygosity, and expression. First, we present previously unreported HLA Class I and II genotypes. Second, we determine HLA expression levels in each cancer cell line, providing insights into HLA downregulation and loss in cancer. Third, using these results, we provide a fundamental cell line "barcode" to track samples and prevent sample annotation swaps and contamination. Fourth, we integrate the cancer cell-line specific HLA types and HLA expression with available cell-line specific mutation information and existing HLA binding prediction algorithms to make a catalog of predicted antigenic mutations in each cell line. The compilation of our results are a fundamental resource for all researchers selecting specific cancer cell lines based on the HLA type and HLA expression, as well as for the development of immunotherapeutic tools for novel cancer treatment modalities.
Keywords: BRENDA, BRaunschweig ENzyme Database; CCLE, Cancer Cell Line Encyclopedia; COSMIC, Catalog of Somatic Mutations in Cancer; DLBCL, diffuse large B-cell lymphoma; HLA expression; HLA type; HLA, Human Leukocyte Antigen; IEDB, Immune Epitope Database; NGS, Next Generation Sequencing; RNA-Seq; RNA-Seq, RNA Sequencing; RPKM, reads per kilobase of exon model per million mapped reads; SNV, single nucleotide variation; SRA, Sequence Read Archive; cancer cell lines; immunotherapy; neoepitopes; nsSNV, non synonymous SNV; somatic mutations.