TCRklass: a new K-string-based algorithm for human and mouse TCR repertoire characterization

J Immunol. 2015 Jan 1;194(1):446-54. doi: 10.4049/jimmunol.1400711. Epub 2014 Nov 17.

Abstract

The next-generation sequencing technology has promoted the study on human TCR repertoire, which is essential for the adaptive immunity. To decipher the complexity of TCR repertoire, we developed an integrated pipeline, TCRklass, using K-string-based algorithm that has significantly improved the accuracy and performance over existing tools. We tested TCRklass using manually curated short read datasets in comparison with in silico datasets; it showed higher precision and recall rates on CDR3 identification. We applied TCRklass on large datasets of two human and three mouse TCR repertoires; it demonstrated higher reliability on CDR3 identification and much less biased V/J profiling, which are the two components contributing the diversity of the repertoire. Because of the sequencing cost, short paired-end reads generated by next-generation sequencing technology are and will remain the main source of data, and we believe that the TCRklass is a useful and reliable toolkit for TCR repertoire analysis.

Publication types

  • Evaluation Study
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Amino Acid Sequence
  • Animals
  • Base Sequence
  • Electronic Data Processing / methods
  • High-Throughput Nucleotide Sequencing / methods*
  • Humans
  • Mice
  • Molecular Sequence Data
  • Receptors, Antigen, T-Cell / analysis
  • Receptors, Antigen, T-Cell / genetics*
  • Receptors, Antigen, T-Cell / immunology
  • Reproducibility of Results
  • Sequence Analysis, DNA
  • V(D)J Recombination / genetics*

Substances

  • Receptors, Antigen, T-Cell