Flexible model selection for mechanistic network models

Sixing Chen; Antonietta Mira; Jukka-Pekka Onnela

doi:10.1093/comnet/cnz024

Flexible model selection for mechanistic network models

J Complex Netw. 2020 Apr;8(2):cnz024. doi: 10.1093/comnet/cnz024. Epub 2019 Aug 2.

Authors

Sixing Chen¹, Antonietta Mira², Jukka-Pekka Onnela

Affiliations

¹ Department of Biostatistics, T.H. Chan School of Public Health, Harvard University 655 Huntington Avenue, Building 2, 4th Floor, Boston, MA 02115, USA.
² Data Science Lab, Institute of Computational Science, Università della Svizzera italiana Via Buffi 6, 6900 Lugano, Switzerland and Dipartimento di Scienza e Alta Tecnologia, Università degli Studi dell'Insubria Via Valleggio, 11 - 22100 Como, Italy.

Abstract

Network models are applied across many domains where data can be represented as a network. Two prominent paradigms for modelling networks are statistical models (probabilistic models for the observed network) and mechanistic models (models for network growth and/or evolution). Mechanistic models are better suited for incorporating domain knowledge, to study effects of interventions (such as changes to specific mechanisms) and to forward simulate, but they typically have intractable likelihoods. As such, and in a stark contrast to statistical models, there is a relative dearth of research on model selection for such models despite the otherwise large body of extant work. In this article, we propose a simulator-based procedure for mechanistic network model selection that borrows aspects from Approximate Bayesian Computation along with a means to quantify the uncertainty in the selected model. To select the most suitable network model, we consider and assess the performance of several learning algorithms, most notably the so-called Super Learner, which makes our framework less sensitive to the choice of a particular learning algorithm. Our approach takes advantage of the ease to forward simulate from mechanistic network models to circumvent their intractable likelihoods. The overall process is flexible and widely applicable. Our simulation results demonstrate the approach's ability to accurately discriminate between competing mechanistic models. Finally, we showcase our approach with a protein-protein interaction network model from the literature for yeast (Saccharomyces cerevisiae).

Keywords: Super Learner; likelihood-free methods; mechanistic network model; model selection.

Grants and funding

U54 GM088558/GM/NIGMS NIH HHS/United States