Methods for diversity and overlap analysis in T-cell receptor populations

J Math Biol. 2013 Dec;67(6-7):1339-68. doi: 10.1007/s00285-012-0589-7. Epub 2012 Sep 25.


The paper presents some novel approaches to the empirical analysis of diversity and similarity (overlap) in biological or ecological systems. The analysis is motivated by the molecular studies of highly diverse mammalian T-cell receptor (TCR) populations, and is related to the classical statistical problem of analyzing two-way contingency tables with missing cells and low cell counts. The new measures of diversity and overlap are proposed, based on the information-theoretic as well as geometric considerations, with the capacity to naturally up-weight or down-weight the rare and abundant population species. The consistent estimates are derived by applying the Good-Turing sample-coverage correction. In particular, novel consistent estimates of the Shannon entropy function and the Morisita-Horn index are provided. Data from TCR populations in mice are used to illustrate the empirical performance of the proposed methods vis a vis the existing alternatives.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Animals
  • Data Interpretation, Statistical
  • Genetic Variation / immunology*
  • Mice
  • Receptors, Antigen, T-Cell / genetics
  • Receptors, Antigen, T-Cell / immunology*
  • T-Lymphocyte Subsets / immunology*
  • T-Lymphocytes / immunology*


  • Receptors, Antigen, T-Cell