Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018;28(2):411-425.
doi: 10.1007/s11222-017-9738-6. Epub 2017 Mar 13.

Likelihood-free inference via classification

Affiliations

Likelihood-free inference via classification

Michael U Gutmann et al. Stat Comput. 2018.

Abstract

Increasingly complex generative models are being used across disciplines as they allow for realistic characterization of data, but a common difficulty with them is the prohibitively large computational cost to evaluate the likelihood function and thus to perform likelihood-based statistical inference. A likelihood-free inference framework has emerged where the parameters are identified by finding values that yield simulated data resembling the observed data. While widely applicable, a major difficulty in this framework is how to measure the discrepancy between the simulated and observed data. Transforming the original problem into a problem of classifying the data into simulated versus observed, we find that classification accuracy can be used to assess the discrepancy. The complete arsenal of classification methods becomes thereby available for inference of intractable generative models. We validate our approach using theory and simulations for both point estimation and Bayesian inference, and demonstrate its use on real data by inferring an individual-based epidemiological model for bacterial infections in child care centers.

Keywords: Approximate Bayesian computation; Generative models; Intractable likelihood; Latent variable models; Simulator-based models.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
Discriminability as discrepancy measure. The observed data X are shown as black circles and were generated with mean θ=(0,0). The hatched areas indicate the Bayes classification rules. a High discriminability: Simulated data Yθ (green diamonds) were generated with θ=(6,0). b Low discriminability: Yθ (red crosses) were generated with θ=(1/2,0). As θ approaches θ, the discriminability (best classification accuracy) of X and Yθ drops. We propose to use the discriminability as discrepancy measure for likelihood-free inference
Fig. 2
Fig. 2
Comparison of the classification accuracy of the Bayes and the learned classification rules for large sample sizes (n=100,000). The symmetric curves depict Jn and Jn as a function of the relative deviation of the model parameter from the true data-generating parameter. As the curves of the different methods are indistinguishable, quadratic discriminant analysis (QDA), L1-regularized polynomial logistic regression (L1 logistic), L1-regularized polynomial support vector machine classification (L1 SVM), and a max-combination of these and other methods (max-rule) perform as well as the Bayes classification rule, which assumes the true distributions to be known (BCR). For linear discriminant analysis (LDA), this holds with the exception of the moving average model
Fig. 3
Fig. 3
Empirical evidence for consistency. The figure shows the mean squared estimation error E[||θ^n-θ||2] for the examples in Fig. 2 as a function of the sample size n (solid lines, circles). The mean was computed as an average over 100 outcomes. The dashed lines depict the mean ±2 standard errors. The linear trend on the log–log scale suggests convergence in quadratic mean, and hence consistency of the estimator θ^n. The results are for L1-regularized logistic regression, see Supplementary material 3 for the other classification methods
Fig. 4
Fig. 4
Posterior distributions inferred by classifier ABC for binary, count, continuous, and time series data. The results are for 10,000 ABC samples and n=50. For the univariate cases, the samples are summarized as empirical pdfs. For the bivariate cases, scatter plots of the obtained samples are shown (the results are for the max-rule). The numbers on the contours are relative to the maximum of the reference posterior. For the autoregressive conditional heteroskedasticity (ARCH) model, the hatched area indicates the domain of the uniform prior. Supplementary material 4 contains additional examples and results. a Binary data (Bernoulli), b count data (Poisson), c continuous data (Gauss), and d time series (ARCH)
Fig. 5
Fig. 5
Sketch of the individual-based epidemic model. The evolution of the colonization states in a single child care center is shown. Colonization is indicated by the black squares
Fig. 6
Fig. 6
Testing the applicability of the discrepancy measure Jn to infer the individual-based epidemic model. The figures show Jn(θ) when one parameter is fixed at a time. The red crosses mark the data-generating parameter value θ=(βo,Λo,θo)=(3.6,0.6,0.1). The presence of random features produced more localized regions with small Jn
Fig. 7
Fig. 7
Inferring the individual-based epidemic model with classifier ABC. The results are for simulated data with known data-generating parameter θ (indicated by the green vertical lines). Classifier ABC with random subsets (blue, circles) or without (red, squares) both yielded posterior pdfs which are qualitatively similar to the expert solution (black). a Posterior pdf for β, b posterior pdf for Λ and c posterior pdf for θ
Fig. 8
Fig. 8
Inference results on real data, visualized as in Fig. 7. a Posterior pdf for β, b posterior pdf for Λ and c posterior pdf for θ
Fig. 9
Fig. 9
Using classifier ABC to compensate for insufficient expert statistics. The setup and visualization is as in Fig. 7. Its expert solution is reproduced for reference. Working with a reduced set of expert statistics affects the posteriors of Λ and θ adversely, but classifier ABC is able to compensate (blue curves with circles vs. black dashed curves). a Internal infection parameter β, b external infection parameter Λ and c co-infection parameter θ

Similar articles

Cited by

References

    1. Aeschbacher S, Beaumont M, Futschik A. A novel approach for choosing summary statistics in approximate Bayesian computation. Genetics. 2012;192(3):1027–1047. doi: 10.1534/genetics.112.143164. - DOI - PMC - PubMed
    1. Barthelmé S, Chopin N. The Poisson transform for unnormalised statistical models. Stat. Comput. 2015;25(4):767–780. doi: 10.1007/s11222-015-9559-4. - DOI
    1. Beaumont M, Zhang W, Balding D. Approximate Bayesian computation in population genetics. Genetics. 2002;162(4):2025–2035. - PMC - PubMed
    1. Beaumont MA. Approximate Bayesian computation in evolution and ecology. Ann. Rev. Ecol. Evol. Syst. 2010;41(1):379–406. doi: 10.1146/annurev-ecolsys-102209-144621. - DOI
    1. Bengio Y, Courville A, Vincent P. Representation learning: a review and new perspectives. IEEE Trans. Pattern Anal. Mach. Intell. 2013;35(8):1798–1828. doi: 10.1109/TPAMI.2013.50. - DOI - PubMed

LinkOut - more resources