Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Sep 9;15(9):e1007181.
doi: 10.1371/journal.pcbi.1007181. eCollection 2019 Sep.

Prepaid parameter estimation without likelihoods

Affiliations

Prepaid parameter estimation without likelihoods

Merijn Mestdagh et al. PLoS Comput Biol. .

Abstract

In various fields, statistical models of interest are analytically intractable and inference is usually performed using a simulation-based method. However elegant these methods are, they are often painstakingly slow and convergence is difficult to assess. As a result, statistical inference is greatly hampered by computational constraints. However, for a given statistical model, different users, even with different data, are likely to perform similar computations. Computations done by one user are potentially useful for other users with different data sets. We propose a pooling of resources across researchers to capitalize on this. More specifically, we preemptively chart out the entire space of possible model outcomes in a prepaid database. Using advanced interpolation techniques, any individual estimation problem can now be solved on the spot. The prepaid method can easily accommodate different priors as well as constraints on the parameters. We created prepaid databases for three challenging models and demonstrate how they can be distributed through an online parameter estimation service. Our method outperforms state-of-the-art estimation techniques in both speed (with a 23,000 to 100,000-fold speed up) and accuracy, and is able to handle previously quasi inestimable models.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. Graphical illustration of the prepaid parameter estimation method.
Fig 2
Fig 2. The RMSE versus the time needed for the estimation of the three parameters of the Ricker model (see Eq 1).
The RMSE and time are based on 100 test data sets with Tobs = 1000. The three colors represent the three parameters (blue for r, red for σ and yellow for ϕ). Solid lines represent the SLOrig approach, dashed lines the SLMLGrid approach (using only nearest neighbors), and dotted lines the SLMLSVM approach (using interpolation). The stars and the dots represent the time needed for the SLMLGrid and the SLMLSVM estimation, respectively. The estimates for SLOrig are posterior means, based on the second half of the finished MCMC iterations. The time of the prepaid method shown in this picture does not include the creation of the prepaid grid, but only the time needed for any researcher to estimate the parameters once a prepaid grid is available.
Fig 3
Fig 3
The mean absolute error of the estimates of four central parameters of the LCA (common input v, leakage γ, mutual inhibition κ, evidence threshold a) as a function of sample size (abscissa) and for three different methods: (1) choosing the nearest neighbor grid point in the space of summary statistics (CHISQNNGrid, triangles); (2) using the average of a set of nearest neighbor grid points based on bootstrap samples (CHISQBSGrid, open circles) and (3) using SVM interpolation between the 100 nearest neighbors (CHISQBSSVM, crosses).
Fig 4
Fig 4. Parameter recovery for the LCA model with 1200 observations (300 in each of the four difficulty conditions); the true value on the abscissa and estimated value on the ordinate.
The same parameters as in Fig 3 are shown. The method used to produce these estimates is the averaged bootstrap approach (CHISQBSGrid, see Methods for details).
Fig 5
Fig 5. RMSE (based on a simulation study) of the toy example estimation as function of the gap size (Δ) and number of nearest neighbors selected to carry out the interpolation (N).
The left panel is called situation 1 in which sobs=y¯ and the right panel is situation 2 (sobs=y¯2). For the second situation, the trade-off between Δ and N is clearly visible.
Fig 6
Fig 6. Estimated versus true parameters of the Ricker model of 100 data sets with Tobs = 1000.
The SLOrig estimation has some problems with outliers.
Fig 7
Fig 7. The accuracy of all estimation methods versus the number of time points Tobs.
The left panel shows the mean squared error, while the right panel shows the median absolute error. The three colors represent the three parameters. Blue lines refer to the parameter r, red lines to the parameter σ and yellow lines to the parameter ϕ. The solid line represents the original synthetic likelihood approach SLOrig (stopping at Tobs = 103), the dashed line the SLMLSVM prepaid approach and the dotted line the SLMLSVM prepaid approach.
Fig 8
Fig 8. The estimation of the three parameters of the Ricker model of 100 data sets with Tobs = 105.
The SLMLSVM estimation clearly outperforms the SLMLGrid estimation.
Fig 9
Fig 9. Samples for Tobs = 1 of the summary statistics of the trait model for parameter set log(I) = 3.0621, log(A) = 0.8302, h = 86.8924 and log(σ) = −0.6899.
Fig 10
Fig 10. Scatter plot matrix of the clustering that occurs for the 100 nearest neighbors for the summary statistics for Tobs = 1000 of parameter log(I) = 3.9081, log(A) = −2.0343, h = 36.4150 and log(σ) = 2.9762.
The red cross shows the true value of this parameter.
Fig 11
Fig 11. Illustration of how different coherences are incorporated.
The gray plane is a simplified representation of the three dimensional (v′, γ′, κ′)-space. For each point g, 50 coherences are chosen. Corresponding to each coherence, there is a pair of RT distributions (which each integrate to the probability of selecting the corresponding option).
Fig 12
Fig 12. Illustration of the transformation of the original parameter space (called A) to a new one (called B) in which D is one of the parameters.
The projections of the three parameter points on the red axis governing the width of the B area are denoted with open circle and these are the parameter points g. For each of these open circle points, the RT distribution scales are set to 1 (i.e., s = 1) by choosing an appropriate diffusion coefficient (denoted as D0g) and any parameter point in B can be reached by selecting an appropriate g and then adjusting the scale up- or downwards (this is indicated by the dotted lines in the length direction of the new parameter space B.
Fig 13
Fig 13. Recovery for the original parameters of the LCA model with Tobs = 1000 observation per stimulus.
See Fig 4 for detailed information.
Fig 14
Fig 14. Recovery for the original parameters of the LCA model with Tobs = 10000 observation per stimulus.
See Fig 4 for detailed information.
Fig 15
Fig 15. The MAE of the estimates of the parameters of the LCA as a function of sample size (abscissa) and for different methods.
More details can be found in the caption of Fig 3.
Fig 16
Fig 16. The RMSE of the estimates of the parameters of the LCA as a function of sample size (abscissa) and for different methods.
More details can be found in the caption of Fig 3.
Fig 17
Fig 17. The coverage of LCA estimates for different number of observations Tobs.
Each line represents one of the nine LCA parameters and plots the fraction of estimates between the [α, 1 − α] quantiles of their bootstrapped confidence intervals. The closer the line to the second diagonal, the better the coverage. Black lines are the result of non-parametric bootstraps obtained through nearest neighbor estimates; red lines are the result of SVM enhanced estimates.

Similar articles

Cited by

References

    1. Beaumont MA, Zhang W, Balding DJ. Approximate Bayesian Computation in Population Genetics. Genetics. 2002;162(4):2025–2035. - PMC - PubMed
    1. Wood SN. Statistical inference for noisy nonlinear ecological dynamic systems. Nature. 2010;466(7310):1102–1104. 10.1038/nature09319 - DOI - PubMed
    1. Fasiolo M, Pya N, Wood SN. A Comparison of Inferential Methods for Highly Nonlinear State Space Models in Ecology and Epidemiology. Statistical Science. 2016;31(1):96–118. 10.1214/15-STS534 - DOI
    1. McFadden D. A Method of Simulated Moments for Estimation of Discrete Response Models Without Numerical Integration. Econometrica. 1989;57(5):995–1026. 10.2307/1913621 - DOI
    1. Fermanian JD, Salanié B. A NONPARAMETRIC SIMULATED MAXIMUM LIKELIHOOD ESTIMATION METHOD. Econometric Theory. 2004;20(4):701–734. 10.1017/S0266466604204054 - DOI

Publication types

Grants and funding

This research was supported by the Research Fund of KU Leuven (GOA/15/003; OT/11/031) and the Interuniversity Attraction Poles program (IAP/P7/06). Merijn Mestdagh and Stijn Verdonck are supported by the Fund of Scientific Research Flanders. The computational resources and services used in this work were provided by the VSC (Flemish Supercomputer Center), funded by the Research Foundation - Flanders (FWO) and the Flemish Government - department EWI. The funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript.