From heterogeneous healthcare data to disease-specific biomarker networks: A hierarchical Bayesian network approach

PLoS Comput Biol. 2021 Feb 12;17(2):e1008735. doi: 10.1371/journal.pcbi.1008735. eCollection 2021 Feb.


In this work, we introduce an entirely data-driven and automated approach to reveal disease-associated biomarker and risk factor networks from heterogeneous and high-dimensional healthcare data. Our workflow is based on Bayesian networks, which are a popular tool for analyzing the interplay of biomarkers. Usually, data require extensive manual preprocessing and dimension reduction to allow for effective learning of Bayesian networks. For heterogeneous data, this preprocessing is hard to automatize and typically requires domain-specific prior knowledge. We here combine Bayesian network learning with hierarchical variable clustering in order to detect groups of similar features and learn interactions between them entirely automated. We present an optimization algorithm for the adaptive refinement of such group Bayesian networks to account for a specific target variable, like a disease. The combination of Bayesian networks, clustering, and refinement yields low-dimensional but disease-specific interaction networks. These networks provide easily interpretable, yet accurate models of biomarker interdependencies. We test our method extensively on simulated data, as well as on data from the Study of Health in Pomerania (SHIP-TREND), and demonstrate its effectiveness using non-alcoholic fatty liver disease and hypertension as examples. We show that the group network models outperform available biomarker scores, while at the same time, they provide an easily interpretable interaction network.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Bayes Theorem
  • Biomarkers / metabolism*
  • Cluster Analysis
  • Computational Biology / methods
  • Computer Simulation
  • Delivery of Health Care
  • Disease / genetics*
  • Gene Regulatory Networks
  • Humans
  • Hypertension / genetics
  • Medical Informatics / methods*
  • Normal Distribution
  • Pattern Recognition, Automated
  • Reproducibility of Results
  • Wine


  • Biomarkers

Grants and funding

We acknowledge funding by the German BMBF via the LiSyM grant (FKZ 031L0032). AKB holds an add-on fellowship from the Joachim Herz Stiftung. HJG has received travel grants and speakers honoraria from Fresenius Medical Care, Neuraxpharm, Servier and Janssen Cilag as well as research funding from Fresenius Medical Care. SHIP is part of the Community Medicine Research Network of the University Medicine Greifswald, which is supported by the German Federal State of Mecklenburg- West Pomerania. The funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript.