Missing data in amortized simulation-based neural posterior estimation

PLoS Comput Biol. 2024 Jun 17;20(6):e1012184. doi: 10.1371/journal.pcbi.1012184. eCollection 2024 Jun.


Amortized simulation-based neural posterior estimation provides a novel machine learning based approach for solving parameter estimation problems. It has been shown to be computationally efficient and able to handle complex models and data sets. Yet, the available approach cannot handle the in experimental studies ubiquitous case of missing data, and might provide incorrect posterior estimates. In this work, we discuss various ways of encoding missing data and integrate them into the training and inference process. We implement the approaches in the BayesFlow methodology, an amortized estimation framework based on invertible neural networks, and evaluate their performance on multiple test problems. We find that an approach in which the data vector is augmented with binary indicators of presence or absence of values performs the most robustly. Indeed, it improved the performance also for the simpler problem of data sets with variable length. Accordingly, we demonstrate that amortized simulation-based inference approaches are applicable even with missing data, and we provide a guideline for their handling, which is relevant for a broad spectrum of applications.

MeSH terms

  • Algorithms
  • Bayes Theorem
  • Computational Biology* / methods
  • Computer Simulation*
  • Data Interpretation, Statistical
  • Humans
  • Machine Learning
  • Neural Networks, Computer*

Grants and funding

This work was supported by the German Federal Ministry of Education and Research (BMBF) (EMUNE/031L0293C and FitMultiCell/031L0159C) and the German Research Foundation (DFG) under Germany’s Excellence Strategy (EXC 2047 - 390685813 and EXC 2151 – 390873048) and the Schlegel Professorship for J.H.. Y.S. acknowledges financial support by the Joachim Herz Stiftung. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.