Tracking global changes induced in the CD4 T-cell receptor repertoire by immunization with a complex antigen using short stretches of CDR3 protein sequence

Bioinformatics. 2014 Nov 15;30(22):3181-8. doi: 10.1093/bioinformatics/btu523. Epub 2014 Aug 5.


Motivation: The clonal theory of adaptive immunity proposes that immunological responses are encoded by increases in the frequency of lymphocytes carrying antigen-specific receptors. In this study, we measure the frequency of different T-cell receptors (TcR) in CD4 + T cell populations of mice immunized with a complex antigen, killed Mycobacterium tuberculosis, using high throughput parallel sequencing of the TcRβ chain. Our initial hypothesis that immunization would induce repertoire convergence proved to be incorrect, and therefore an alternative approach was developed that allows accurate stratification of TcR repertoires and provides novel insights into the nature of CD4 + T-cell receptor recognition.

Results: To track the changes induced by immunization within this heterogeneous repertoire, the sequence data were classified by counting the frequency of different clusters of short (3 or 4) continuous stretches of amino acids within the antigen binding complementarity determining region 3 (CDR3) repertoire of different mice. Both unsupervised (hierarchical clustering) and supervised (support vector machine) analyses of these different distributions of sequence clusters differentiated between immunized and unimmunized mice with 100% efficiency. The CD4 + TcR repertoires of mice 5 and 14 days postimmunization were clearly different from that of unimmunized mice but were not distinguishable from each other. However, the repertoires of mice 60 days postimmunization were distinct both from naive mice and the day 5/14 animals. Our results reinforce the remarkable diversity of the TcR repertoire, resulting in many diverse private TcRs contributing to the T-cell response even in genetically identical mice responding to the same antigen. However, specific motifs defined by short stretches of amino acids within the CDR3 region may determine TcR specificity and define a new approach to TcR sequence classification.

Availability and implementation: The analysis was implemented in R and Python, and source code can be found in Supplementary Data.


Supplementary information: Supplementary data are available at Bioinformatics online.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Amino Acid Sequence
  • Animals
  • CD4-Positive T-Lymphocytes / immunology*
  • Cluster Analysis
  • Complementarity Determining Regions / chemistry*
  • Immunization
  • Mice
  • Mycobacterium tuberculosis / immunology
  • Receptors, Antigen, T-Cell / chemistry*
  • Receptors, Antigen, T-Cell / immunology
  • Receptors, Antigen, T-Cell, alpha-beta / chemistry
  • Sequence Analysis, Protein
  • Support Vector Machine


  • Complementarity Determining Regions
  • Receptors, Antigen, T-Cell
  • Receptors, Antigen, T-Cell, alpha-beta