A Clinically and Biologically Based Subclassification of the Idiopathic Inflammatory Myopathies Using Machine Learning

ACR Open Rheumatol. 2020 Mar;2(3):158-166. doi: 10.1002/acr2.11115. Epub 2020 Feb 10.

Abstract

Objective: Published predictive models of disease outcomes in idiopathic inflammatory myopathies (IIMs) are sparse and of limited accuracy due to disease heterogeneity. Computational methods may address this heterogeneity by partitioning patients based on clinical and biological phenotype.

Methods: To identify new patient groups, we applied similarity network fusion (SNF) to clinical and biological data from 168 patients with myositis (64 adult polymyositis [PM], 65 adult dermatomyositis [DM], and 39 juvenile DM [JDM]) in the Rituximab in Myositis trial. We generated a sparse proof-of-concept bedside classifier using multinomial regression and identified characteristics that distinguished these groups. We conducted χ2 tests to link new patient groups with the myositis subtypes.

Results: SNF identified five patient groups in the discovery cohort that subdivided the myositis subtypes. The sparse multinomial regressor to predict patient group assignments (areas under the receiver operating characteristic curve = [0.78, 0.97]; areas under the precision-recall curve = [0.55, 0.96]) found that autoantibody enrichment defined four of these groups: anti-Mi-2, anti-signal recognition peptide (SRP), anti-nuclear matrix protein 2 (NXP2), and anti-synthetase (Syn). Depletion of immunoglobulin M (IgM) defined the fifth group. Each group was associated with one subtype, with adult DM being associated with anti-Mi-2 and anti-Syn autoantibodies, JDM being associated with anti-NXP2 autoantibodies, and adult PM being associated with IgM depletion and anti-SRP autoantibodies. These associations enabled us to further resolve the current myositis subtypes.

Conclusion: Using unsupervised machine learning, we identified clinically and biologically homogeneous groups of patients with IIMs, forming the basis of an integrated disease classification based on both clinical and biological phenotype, thus validating other approaches and what has been previously described.