Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014 Nov 15;30(22):3181-8.
doi: 10.1093/bioinformatics/btu523. Epub 2014 Aug 5.

Tracking global changes induced in the CD4 T-cell receptor repertoire by immunization with a complex antigen using short stretches of CDR3 protein sequence

Affiliations

Tracking global changes induced in the CD4 T-cell receptor repertoire by immunization with a complex antigen using short stretches of CDR3 protein sequence

Niclas Thomas et al. Bioinformatics. .

Abstract

Motivation: The clonal theory of adaptive immunity proposes that immunological responses are encoded by increases in the frequency of lymphocytes carrying antigen-specific receptors. In this study, we measure the frequency of different T-cell receptors (TcR) in CD4 + T cell populations of mice immunized with a complex antigen, killed Mycobacterium tuberculosis, using high throughput parallel sequencing of the TcRβ chain. Our initial hypothesis that immunization would induce repertoire convergence proved to be incorrect, and therefore an alternative approach was developed that allows accurate stratification of TcR repertoires and provides novel insights into the nature of CD4 + T-cell receptor recognition.

Results: To track the changes induced by immunization within this heterogeneous repertoire, the sequence data were classified by counting the frequency of different clusters of short (3 or 4) continuous stretches of amino acids within the antigen binding complementarity determining region 3 (CDR3) repertoire of different mice. Both unsupervised (hierarchical clustering) and supervised (support vector machine) analyses of these different distributions of sequence clusters differentiated between immunized and unimmunized mice with 100% efficiency. The CD4 + TcR repertoires of mice 5 and 14 days postimmunization were clearly different from that of unimmunized mice but were not distinguishable from each other. However, the repertoires of mice 60 days postimmunization were distinct both from naive mice and the day 5/14 animals. Our results reinforce the remarkable diversity of the TcR repertoire, resulting in many diverse private TcRs contributing to the T-cell response even in genetically identical mice responding to the same antigen. However, specific motifs defined by short stretches of amino acids within the CDR3 region may determine TcR specificity and define a new approach to TcR sequence classification.

Availability and implementation: The analysis was implemented in R and Python, and source code can be found in Supplementary Data.

Contact: b.chain@ucl.ac.uk

Supplementary information: Supplementary data are available at Bioinformatics online.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
The computational pipeline for classifying TcR repertoires. A schematic of the computational pipeline is shown on the left, and a specific example for two arbitrary TcR β sequences is shown on the right (with p = 3). CDR3 sequences are preprocessed and represented as a series of p-tuples (contiguous sequences of amino acids of length p). The p-tuples are then converted into numeric vectors of length 5 p by representing each amino acid by its five Atchley factors. The codebook is then generated—a sample of these vectors pooled from all experimental groups is clustered to build a codebook of k code words via k-means clustering. A new sample of q p-tuples from each mouse is then selected and mapped to the nearest code word. The number of p-tuples within each code word for that mouse is counted. The sequence data from each mouse are therefore represented by a feature vector of length k, containing the frequency of each code word within the sample. These k length vectors are then analysed by hierarchical clustering or SVM
Fig. 2.
Fig. 2.
The similarity (Jaccard) index comparing all pairs of mice. Each dot represents the Jaccard index comparing all CDR3 sequences from two mice. CDR3 repertoires from pairs of untreated (U) mice, or pairs of immunized (I) mice, display greater similarity (i.e. have a larger Jaccard index) than repertoires from pairs of mice where one mouse is immunized and one is not immunized. Horizontal black lines indicate mean of each population
Fig. 3.
Fig. 3.
CDR3 sequences shared between immunized mice. (a) The frequency (counts per million) of 57 CDR3s that are present in 75% of the immunized mice, but absent from all unimmunized mice (not shown). Each column represents one mouse, grouped according to time after immunization as shown below the x axis. (b) The amino acid sequences of all 57 CDR3s, clustered according to Levenstein distance. (c) A plot of the frequency of each individual amino acid triplet (i.e. sequence of three consecutive amino acids, see Fig. 1) encoded by the 57 CDR3s, measured within the 57 CDR3s themselves (x axis) versus the frequency of the same triplets within a random sample of 1000 sets of 57 CDR3s selected from the set of CDR3s from all immunized mice (y axis). The diagonal line designates an equal frequency in the shared CDR3s and in the random set. Those triplets that are overrepresented in the shared CDR3s are found in the lower right area of the plot
Fig. 4.
Fig. 4.
Hierarchical clustering distinguishes between the code word (clusters of triplets) distribution profiles of unimmunized and immunized mice. Each mouse was categorized as described in the text, using k = 100, p = 3 (triplets), q = 10 000. The heatmap shows the relative proportion of sequences within each code word (rows) for each mouse (columns). A small group of codewords appeared more frequently in untreated mice compared with immunized (bottom left corner of heatmap), while conversely a larger group of codewords appear more frequently in immunized mice (top right). The data are clustered along both axes using Euclidean distances and complete linkage method in the R function ‘hclust’
Fig. 5.
Fig. 5.
Differences between code word (clusters of triplets) distribution profiles between unimmunized and immunized mice. Each mouse was categorized as described in text, using k = 100, p = 3 (triplets), q = 10 000. The relative frequency of each code word cluster for each group of six mice is averaged and shown relative to the average in the unimmunized group for the corresponding code word. For clarity, the data for only the first 34 code words are shown
Fig. 6.
Fig. 6.
SVM can efficiently classify time-dependent changes in CDR3 repertoire following immunization. A subsample of q = 10 000 triplets (p = 3) was taken from each mouse to generate a frequency distribution over the code words and train and test a leave-one-out linear SVM. This was repeated 100 times, and the proportion of these repetitions classified as each of the four classes is shown

Similar articles

Cited by

References

    1. Atchley WR, et al. Solving the protein sequence metric problem. Proc. Natl Acad. Sci. USA. 2005;102:6395–6400. - PMC - PubMed
    1. Birnbaum ME, et al. Diversity-oriented approaches for interrogating T-cell receptor repertoire, ligand recognition, and function. Immunol. Rev. 2012;250:82–101. - PMC - PubMed
    1. Brenke R, et al. Application of asymmetric statistical potentials to antibody-protein docking. Bioinformatics. 2012;28:2608–2614. - PMC - PubMed
    1. Burnet FM. The Clonal Selection Theory of Acquired Immunity. Nashville: Vanderbilt University Press; 1959.
    1. Catron DM, et al. Visualizing the first 50 hr of the primary immune response to a soluble antigen. Immunity. 2004;21:341–347. - PubMed

Publication types

MeSH terms

Substances