Practical sampling of constraint-based models: Optimized thinning boosts CHRR performance

PLoS Comput Biol. 2023 Aug 11;19(8):e1011378. doi: 10.1371/journal.pcbi.1011378. eCollection 2023 Aug.

Abstract

Thinning is a sub-sampling technique to reduce the memory footprint of Markov chain Monte Carlo. Despite being commonly used, thinning is rarely considered efficient. For sampling constraint-based models, a highly relevant use-case in systems biology, we here demonstrate that thinning boosts computational and, thereby, sampling efficiencies of the widely used Coordinate Hit-and-Run with Rounding (CHRR) algorithm. By benchmarking CHRR with thinning with simplices and genome-scale metabolic networks of up to thousands of dimensions, we find a substantial increase in computational efficiency compared to unthinned CHRR, in our examples by orders of magnitude, as measured by the effective sample size per time (ESS/t), with performance gains growing with polytope (effective network) dimension. Using a set of benchmark models we derive a ready-to-apply guideline for tuning thinning to efficient and effective use of compute resources without requiring additional coding effort. Our guideline is validated using three (out-of-sample) large-scale networks and we show that it allows sampling convex polytopes uniformly to convergence in a fraction of time, thereby unlocking the rigorous investigation of hitherto intractable models. The derivation of our guideline is explained in detail, allowing future researchers to update it as needed as new model classes and more training data becomes available. CHRR with deliberate utilization of thinning thereby paves the way to keep pace with progressing model sizes derived with the constraint-based reconstruction and analysis (COBRA) tool set. Sampling and evaluation pipelines are available at https://jugit.fz-juelich.de/IBG-1/ModSim/fluxomics/chrrt.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Genome
  • Metabolic Networks and Pathways
  • Models, Biological*
  • Systems Biology / methods

Grants and funding

JFJ received funding from the Helmholtz Association of German Research Centres and performed this work as part of the Helmholtz School for Data Science in Life, Earth and Energy (HDS-LEE). The funder had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.