Background: Defining clinical phenotypes provides opportunities for new diagnostics and may provide insights into early intervention and disease prevention. There is increasing evidence that patient-derived health data may contain information that complements traditional methods of clinical phenotyping. The utility of these data for defining meaningful phenotypic groups is of great interest because social media and online resources make it possible to query large cohorts of patients with health conditions.
Methods: We evaluated the degree to which patient-reported categorical data is useful for discovering subclinical phenotypes and evaluated its utility for discovering new measures of disease severity, treatment response and genetic architecture. Specifically, we examined the responses of 1961 patients with inflammatory bowel disease to questionnaires in search of sub-phenotypes. We applied machine learning methods to identify novel subtypes of Crohn's disease and studied their associations with drug responses.
Results: Using the patients' self-reported information, we identified two subpopulations of Crohn's disease; these subpopulations differ in disease severity, associations with smoking, and genetic transmission patterns. We also identified distinct features of drug response for the two Crohn's disease subtypes. These subtypes show a trend towards differential genotype signatures.
Conclusion: Our findings suggest that patient-defined data can have unplanned utility for defining disease subtypes and may be useful for guiding treatment approaches.
Keywords: Classification; Crohn’s disease; Patient-reported data; Phenotypes; Subtypes.