Network analysis of longitudinal electronic health records using linear mixed models

BioData Min. 2026 Feb 4;19(1):7. doi: 10.1186/s13040-025-00508-y.

Abstract

Background: The accelerating development of healthcare data stored in electronic health records (EHRs) has created novel opportunities for biomedical research. Clinical data contains rich, heterogeneous, and longitudinal information of diverse cohorts of population. These data can be analyzed to uncover complex patterns that describe disease progression, comorbidities, and patient trajectories. Significant analytical challenges arise when identifying relationships among clinical variables, often due to high dimensionality and interdependence of temporal data, which requires adaptable methodologies. Classical approaches like Gaussian Graphical Modeling and Vector Autoregression often fall short in addressing these complexities because of the strict required assumptions of independence and stationarity, limiting their applicability to real-world EHRs data. To address this, we present MariNET, a novel approach based on linear mixed models designed to build networks based on clinical variables interactions from longitudinal EHRs. The methodology can effectively handle correlated observations and confounding variables, providing a robust framework for analyzing dynamic interactions between clinical variables over time, offering a scalable and plausible model which is adaptable to researchers’ questions.

Results: We show the potential of our method in the analysis of three different datasets. When applied to a COVID-19 mental health cohort, the model successfully captured symptom interactions aligned with previous knowledge. This result demonstrates robustness in handling repeated measures and homogeneous data. In evaluations using Parkinson’s Disease data, we effectively modeled temporal interactions among key clinical variables, aligning with known PD symptom progression and outperforming partial correlation and autoregressive vector models in handling heterogeneous and missing data. In a modified PD dataset by incorporating sex as a random factor, novel method also excelled in correcting biased interactions, unlike partial correlation, which produced misleading results.

Conclusions: By using a linear mixed model-based approach, we can effectively capture dynamic interactions in longitudinal EHR data, outperforming traditional methods in analyzing complex clinical datasets. The different use cases here presented support the application of network science in healthcare, enabling more accurate predictions, evidence-based decision-making and improved disease management.

Supplementary Information: The online version contains supplementary material available at 10.1186/s13040-025-00508-y.