Using topological data analysis and pseudo time series to infer temporal phenotypes from electronic health records

Artif Intell Med. 2020 Aug:108:101930. doi: 10.1016/j.artmed.2020.101930. Epub 2020 Jul 15.

Abstract

Temporal phenotyping enables clinicians to better understand observable characteristics of a disease as it progresses. Modelling disease progression that captures interactions between phenotypes is inherently challenging. Temporal models that capture change in disease over time can identify the key features that characterize disease subtypes that underpin these trajectories. These models will enable clinicians to identify early warning signs of progression in specific sub-types and therefore to make informed decisions tailored to individual patients. In this paper, we explore two approaches to building temporal phenotypes based on the topology of data: topological data analysis and pseudo time-series. Using type 2 diabetes data, we show that the topological data analysis approach is able to identify disease trajectories and that pseudo time-series can infer a state space model characterized by transitions between hidden states that represent distinct temporal phenotypes. Both approaches highlight lipid profiles as key factors in distinguishing the phenotypes.

Keywords: Electronic phenotyping; Longitudinal studies; Type 2 diabetes; Unsupervised machine learning.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Data Analysis
  • Diabetes Mellitus, Type 2*
  • Electronic Health Records*
  • Humans
  • Phenotype