A somatic hypermutation-based machine learning model stratifies individuals with Crohn's disease and controls

Genome Res. 2023 Jan;33(1):71-79. doi: 10.1101/gr.276683.122. Epub 2022 Dec 16.

Abstract

Crohn's disease (CD) is a chronic relapsing-remitting inflammatory disorder of the gastrointestinal tract that is characterized by altered innate and adaptive immune function. Although massively parallel sequencing studies of the T cell receptor repertoire identified oligoclonal expansion of unique clones, much less is known about the B cell receptor (BCR) repertoire in CD. Here, we present a novel BCR repertoire sequencing data set from ileal biopsies from pediatric patients with CD and controls, and identify CD-specific somatic hypermutation (SHM) patterns, revealed by a machine learning (ML) algorithm trained on BCR repertoire sequences. Moreover, ML classification of a different data set from blood samples of adults with CD versus controls identified that V gene usage, clusters, or mutation frequencies yielded excellent results in classifying the disease (F1 > 90%). In summary, we show that an ML algorithm enables the classification of CD based on unique BCR repertoire features with high accuracy.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Adult
  • Algorithms
  • Biopsy
  • Child
  • Chronic Disease
  • Crohn Disease* / genetics
  • Humans
  • Machine Learning