Sequence Mining of Comorbid Neurodevelopmental Disorders Using the SPADE Algorithm

Methods Inf Med. 2016 May 17;55(3):223-33. doi: 10.3414/ME15-01-0142. Epub 2016 Feb 5.

Abstract

Objectives: Understanding the progression of comorbid neurodevelopmental disorders (NDD) during different critical time periods may contribute to our comprehension of the underlying pathophysiology of NDDs. The objective of our study was to identify frequent temporal sequences of developmental diagnoses in noisy patient data.

Methods: We used a data set of 2810 patients, documenting NDD diagnoses given to them by an NDD expert at a child developmental center during multiple visits at different ages. Extensive preprocessing steps were developed in order to allow the data set to be processed by an efficient sequence mining algorithm (SPADE).

Results: The discovered sequences were validated by cross validation for 10 iterations; all correlation coefficients for support, confidence and lift measures were above 0.75 and their proportions were similar. No signifi- cant differences between the distributions of sequences were found using Kolmogorov-Smirnov test.

Conclusions: We have demonstrated the feasibility of using the SPADE algorithm for discovery of valid temporal sequences of comorbid disorders in children with NDDs. The identification of such sequences would be beneficial from clinical and research perspectives. Moreover, these sequences could serve as features for developing a full-fledged temporal predictive model.

Keywords: SPADE; Sequence mining; comorbidity; neurodevelopmental disorders.

MeSH terms

  • Adolescent
  • Algorithms*
  • Child
  • Child, Preschool
  • Comorbidity
  • Data Mining*
  • Humans
  • Infant
  • Models, Theoretical
  • Neurodevelopmental Disorders / pathology*
  • Time Factors