Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Nov 25;11(11):e1004503.
doi: 10.1371/journal.pcbi.1004503. eCollection 2015 Nov.

VDJtools: Unifying Post-analysis of T Cell Receptor Repertoires

Affiliations

VDJtools: Unifying Post-analysis of T Cell Receptor Repertoires

Mikhail Shugay et al. PLoS Comput Biol. .

Abstract

Despite the growing number of immune repertoire sequencing studies, the field still lacks software for analysis and comprehension of this high-dimensional data. Here we report VDJtools, a complementary software suite that solves a wide range of T cell receptor (TCR) repertoires post-analysis tasks, provides a detailed tabular output and publication-ready graphics, and is built on top of a flexible API. Using TCR datasets for a large cohort of unrelated healthy donors, twins, and multiple sclerosis patients we demonstrate that VDJtools greatly facilitates the analysis and leads to sound biological conclusions. VDJtools software and documentation are available at https://github.com/mikessh/vdjtools.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. Overview of VDJtools software package.
VDJtools analysis routines can be grouped into 6 modules and are applicable to results produced by commonly used immune repertoire sequencing processing software. Basic statistics and segment usage module include general statistics (clonotype and read count, number and frequency of non-coding clonotypes, convergent recombination of CDR3 amino acid sequences, insert size statistics, etc), spectratyping (distribution of clonotype frequency by CDR3 length), Variable and Joining segment usage profiles and their pairing frequency in re-arranged receptor junction sequences. Repertoire overlap module includes routines for computing sets of overlapping clonotypes and their characteristics, and scatter plots of clonotype frequencies. Diversity analysis includes routines for visualizing clonotype frequency distribution, computing repertoire diversity estimates and rarefaction plots. The fourth set of routines can be used to create clonotype abundance profiles and track clonotypes in time course of vaccination, myeloablation and blood cell transplant. Sample clustering is implemented based on computed repertoire similarity measures and could be used to distinguish various biological conditions, cell subsets and tissues. Auxiliary routines provide means for clonotype table filtering (e.g. by segment usage or non-coding CDR3 sequence) as well as annotation with custom or pre-built pathogen-specific clonotype database. VDJtools can be incorporated in Java programming language-based pipelines as demonstrated by VDJviz clonotype browser.
Fig 2
Fig 2. Estimation of repertoire diversity using multinomial model.
A. Rarefaction analysis of repertoire samples from healthy donors and multiple sclerosis patients. The number of unique clonotypes in a sub-sample plotted against its size (number of T-cell receptor cDNA molecules, TRBM). Solid and dashed lines are diversity estimates computed by interpolating and extrapolating using a multinomial model respectively [29]. Note that generally rarefaction curves for MS samples go below those of control donors. Post-HSCT sample (MS8-HSCT) displays the lowest diversity. B. Comparison of repertoire diversity using normalized Chao1 estimate. Normalization is performed by down-sampling datasets to the size of smallest dataset and computing the estimate for resulting datasets (mean estimate value from n = 3 re-samples is used). MS8-HSCT sample is discarded from calculations. *—P = 0.022, two-tailed T-test; effect size estimated by Cohen’s d is 0.98.
Fig 3
Fig 3. Overlap and clustering of TCR repertoires.
A. Hierarchical clustering of healthy donor and multiple sclerosis (MS) patient samples using F pairwise similarity metric (the geometric mean of the total frequency of overlapping clonotypes in first and second sample in pair). B. Multi-dimensional scaling (MDS) plot. Samples were projected onto two-dimensional plane based on pairwise similarities (F metric). C. Permutation testing for closeness of samples coming from the same group based on MDS plot. The plot shows observed (dashed red lines) and permuted (histograms) average within-group sample distance. In contrast to control group, MS group displays highly dissimilar T-cell repertoires. N = 10,000 permutations of group labels were performed. D. Hierarchical clustering of samples based on the Euclidean distance between Variable segment frequency vectors. Note that the clustering provides a nice separation between sample groups (Control and MS, P = 0.013, Fisher’s exact test).
Fig 4
Fig 4. Analysis of autologous HSCT-driven changes in T-cell repertoire.
A. Stacked clonotype frequency plot highlighting the details of overlap between sample MS8 (before autologous HSCT) and MS8-HSCT (post HSCT). Top 100 clonotypes based on their average frequency in those samples are shown, while other clonotypes that were observed in both samples are marked as “Not shown”. The frequency of remaining clonotypes is marked as “Not in overlap”. B. Changes in Variable-Joining segment pairing in CDR3 junctions changes induced by HSCT. Chord diagram is used for visualization, ribbons connecting segment pairs are scaled by corresponding V-J pair frequency. “TRB” prefix is stripped from segment names for simplicity.
Fig 5
Fig 5. CDR3 junction features.
MS patient-derived repertoire is enriched for TCR sequences with long VJ insert, partially due to high abundance of specific Variable segment regions. A. Length of Variable and Joining segment germline parts within CDR3 (V-germ and J-germ) and of VJ insert (VJ-junc) compared between MS donors and healthy controls. B. Average length of VJ junctions among all and selected V-segments (TRBV5-6,5–1,5–8,7–6 and 20–1, shown to be over-expressed in MS patients compared to controls, see main text) according to TCR sequences from repertoires of healthy donors. C. Comparison of VJ insert lengths between control and MS donors for clonotypes with TRBV5-6,5–1,5–8,7–6 and 20–1 segments. P-values computed using two-tailed unpaired T-test (A, C) and paired T-test (B).

Similar articles

Cited by

References

    1. Bolotin DA, Shugay M, Mamedov IZ, Putintseva EV, Turchaninova MA, et al. (2013) MiTCR: software for T-cell receptor sequencing data analysis. Nat Methods 10: 813–814. 10.1038/nmeth.2555 - DOI - PubMed
    1. Shugay M, Britanova OV, Merzlyak EM, Turchaninova MA, Mamedov IZ, et al. (2014) Towards error-free profiling of immune repertoires. Nat Methods 11: 653–655. 10.1038/nmeth.2960 - DOI - PubMed
    1. Bolotin DA, Poslavsky S, Mitrophanov I, Shugay M, Mamedov IZ, et al. (2015) MiXCR: software for comprehensive adaptive immunity profiling. Nat Methods 12: 380–381. 10.1038/nmeth.3364 - DOI - PubMed
    1. Alamyar E, Giudicelli V, Li S, Duroux P, Lefranc MP (2012) IMGT/HighV-QUEST: the IMGT(R) web portal for immunoglobulin (IG) or antibody and T cell receptor (TR) analysis from NGS high throughput and deep sequencing. Immunome Res 8: 26.
    1. Ye J, Ma N, Madden TL, Ostell JM (2013) IgBLAST: an immunoglobulin variable domain sequence analysis tool. Nucleic Acids Res 41: W34–40. 10.1093/nar/gkt382 - DOI - PMC - PubMed

Substances

Grants and funding

This work was supported by the Russian Science Foundation project №14-14-00533 (VDJtools development) and RFBR grant 13-04-00998 (cDNA libraries preparation). The work was carried out in part using equipment provided by the Shemyakin– Ovchinnikov Institute of Bioorganic Chemistry Core Facility. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.