I was wondering whether it is legitimate to compute the LaPlace approximation ussing a hession (hh) matrix obtained from an MCMC run, instead of the hh matrix coming from the mode finder.
The reason I want to know this is that the mode finde (mode_compute=1) ran into the familiar post kernel issue problem. The mode_check plots looked okay, so I then started an mcmc by taking the _mode file but replacing the hh matrix with a (positive definite) alternative (I can say more on how I created it if there is interest), and the MCM did indeed converge (I did one run using the hh matrix I created, and then a second one using hh matrix calculated based on the first run), and the posterior mean is also close to the “mode” I got using the mode finder.
Of course, this gives me the harmonic mean estimator of the likelihood, but I would still like to use the Laplace approximation as well, because the Laplace has a very direct link to the one-step ahead prediction errors of the observables (at the posterior mode or mean of the parameters say), and one can then check for which variables the one-step ahead prediction error actually differs between two different models (of course one also needs to check whether there are big differences in -0.5*log(det(hh)), but that’s not the case for the models I am working with).
Many thanks and best wishes,
I would say that that is not allowed. The idea of the Laplace approximation is to conduct a Gaussian approximation to the posterior density. The Gaussian distribution is single-peaked with a positive definite covariance matrix. When you substitute the covariance matrix from the MCMC, the meaning in terms of approximation is not clear. What is the “point” at which the approximation was conducted?
Thank you! But with a large number of draws, wouldn’t we expect the covariance matrix from the MCMC to converge to the covariance matrix one would normally use for the laplace approximation? I thought part of the idea of the laplace approximation is that with a large sample, the posterior converges to a Gaussian distribution. Shouldn’t the covariance matrix calculated from that sample converge as well to the matrix one would get with a numerical mode finder?
Maybe I should say “to the matrix that one should get with a mode finder if the mode finder works okay”
The asymptotic normality requires regularity conditions that may not necessarily be satisfied. For example, the posterior may be bimodal. Of course, in that case, the Laplace approximation would also be wrong. So, asymptotically, you should be correct.
Thank you! Judging from the posterior plots it is not bimodal, and the brooks-Gellman measure show very nice convergence, so I hope it is safe to assume that it is unimodal.
Have you tried computing the Hessian at the
No I haven’t. Why would that be interesting? Also I wonder would I do that? I guess one could use it as a starting point in for the mode finder but set the number for the mode finder to zero…?
You may want to check whether the mh_mode is actually higher than the original one. Also, you can try that as a starting point. Depending on your model, you want want to run
mode_compute=5 and set
analytic_derivation. Maybe you get a working Hessian then.
Hi Johannes, with “higher” you mean that the likelihood is higher at the mh_mode than at the mode found by the mode finder? I need to check that but probably it is close but lower. But true, one could use it as a starting point.
Thanks for the second suggestion, unfortunately we have missing observations and then analytic_derivation does not work, according to the manual. But it is good to know that this option can be useful when there are problems with the hessian.
I checked it now and as I expected the la place approximation at the mh_mode is about 7.5 log points lower than at the mode from the mode finder, both times using the same hh matrix (the one from the MCMC).
Then you should stick to the mode-finder one.