High-throughput sequencing of the T-cell receptor repertoire: pitfalls and opportunities

Brief Bioinform. 2018 Jul 20;19(4):554-565. doi: 10.1093/bib/bbw138.


T-cell specificity is determined by the T-cell receptor, a heterodimeric protein coded for by an extremely diverse set of genes produced by imprecise somatic gene recombination. Massively parallel high-throughput sequencing allows millions of different T-cell receptor genes to be characterized from a single sample of blood or tissue. However, the extraordinary heterogeneity of the immune repertoire poses significant challenges for subsequent analysis of the data. We outline the major steps in processing of repertoire data, considering low-level processing of raw sequence files and high-level algorithms, which seek to extract biological or pathological information. The latest generation of bioinformatics tools allows millions of DNA sequences to be accurately and rapidly assigned to their respective variable V and J gene segments, and to reconstruct an almost error-free representation of the non-templated additions and deletions that occur. High-level processing can measure the diversity of the repertoire in different samples, quantify V and J usage and identify private and public T-cell receptors. Finally, we discuss the major challenge of linking T-cell receptor sequence to function, and specifically to antigen recognition. Sophisticated machine learning algorithms are being developed that can combine the paradoxical degeneracy and cross-reactivity of individual T-cell receptors with the specificity of the overall T-cell immune response. Computational analysis will provide the key to unlock the potential of the T-cell receptor repertoire to give insight into the fundamental biology of the adaptive immune system and to provide powerful biomarkers of disease.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Antibody Diversity
  • Genetic Variation*
  • High-Throughput Nucleotide Sequencing / methods*
  • Humans
  • Immunoglobulin Variable Region / genetics
  • Immunoglobulin Variable Region / immunology
  • Receptors, Antigen, T-Cell / genetics*
  • Receptors, Antigen, T-Cell / immunology
  • Sequence Analysis, DNA / methods*
  • Software
  • T-Lymphocytes / immunology


  • Immunoglobulin Variable Region
  • Receptors, Antigen, T-Cell