The biologic basis of clinical heterogeneity in juvenile idiopathic arthritis

Arthritis Rheumatol. 2014 Dec;66(12):3463-75. doi: 10.1002/art.38875.


Objective: Childhood arthritis encompasses a heterogeneous family of diseases. Significant variation in clinical presentation remains despite consensus-driven diagnostic classifications. Developments in data analysis provide powerful tools for interrogating large heterogeneous data sets. We report a novel approach to integrating biologic and clinical data toward a new classification for childhood arthritis, using computational biology for data-driven pattern recognition.

Methods: Probabilistic principal components analysis was used to transform a large set of data into 4 interpretable indicators or composite variables on which patients were grouped by cluster analysis. Sensitivity analysis was conducted to determine key variables in determining indicators and cluster assignment. Results were validated against an independent validation cohort.

Results: Meaningful biologic and clinical characteristics, including levels of proinflammatory cytokines and measures of disease activity, defined axes/indicators that identified homogeneous patient subgroups by cluster analysis. The new patient classifications resolved major differences between patient subpopulations better than International League of Associations for Rheumatology subtypes. Fourteen variables were identified by sensitivity analysis to crucially determine indicators and clusters. This new schema was conserved in an independent validation cohort.

Conclusion: Data-driven unsupervised machine learning is a powerful approach for interrogating clinical and biologic data toward disease classification, providing insight into the biology underlying clinical heterogeneity in childhood arthritis. Our analytical framework enabled the recovery of unique patterns from small cohorts and addresses a major challenge, patient numbers, in studying rare diseases.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Adolescent
  • Age Factors
  • Arthritis, Juvenile / classification*
  • Arthritis, Juvenile / diagnosis
  • Arthritis, Juvenile / immunology
  • Child
  • Child, Preschool
  • Cluster Analysis
  • Cohort Studies
  • Cytokines / immunology*
  • Data Interpretation, Statistical
  • Delayed Diagnosis
  • Female
  • Humans
  • Infant
  • Inflammation Mediators / immunology*
  • Male
  • Principal Component Analysis
  • Reproducibility of Results
  • Severity of Illness Index
  • Sex Factors


  • Cytokines
  • Inflammation Mediators