I am using the matlab optimizer fmincon to optimize the posterior distribution in DYNARE.
But the optimizer seems to be getting stuck and lost, when it is searching for the mode. This of course is because the posterior is not well behaved.
I understand that the prior can be used to induce proper curvature in the posterior.
Including the shocks, I have around 45 parameters in my model. Is it by plain trial and error that the priors are set to induce the right curvature in the posterior? I also suppose the set of priors that will give us a particular curvature of the posterior is not unique.
For some parameters in these DSGE models, there appear to be no prior information, say like the std errors of the shocks.
Is the ‘proper’ computation of the posterior mode necessary? Given that we are using the random walk MH algorithm, will not any point that the optimizer yields, be sufficient to get the full simulated distribution of the parameters?
The reason the posterior is not “well-behaved” is that not all the parameters in your model are well-identified by the data. In other words, the likelihood funciton is nearly flat along several dimensions. The choice of priors should be based on a-priori beliefs about possible values a parameter can take before looking at the data. Uninformative priors are typically used for parameters we know nothing about. I would take a look at some studies by Lubik and Schorfheide and Smets and Wouters for examples. It IS very important to start your simulation at the mode specially when your posterior is not well-behaved, since your chain may get “stuck” in a region of the distribution surrounded by a local maximum. If you do not start the MCMC at the mode, your chain may never converge (or only converge after several million draws after which weeks have passed.)
I also had a hard time using the algorithms which rely on gradient-methods and Hessians. I ended up writing my own algorithm which was much slower (takes about 10 hours) but stable. You can write an algorithm that takes random draws, accpets them if the posterior increases, rejects otherwise. A more sofisticated version would introduce learning from the accepted draws by computing the var-covariance matrix for the parameters in the chain, and then using this estimated varcovar matrix to generate draws from a multivariate normal distribution. So the algorithm “learns” from the randomness and then moves in the direction of acceptance.
The book “Bayesian Data Analysis” by Gelman et al. covers several different mode-finding algorithms that are fairly easy to code yourself.
Another approach is to update one parameter at a time, conditioning on the others. After about 5 iterations the mode is reached. But no matter which algorithm you use to find the mode, you should make sure that the mode you found is not a local maximum, but the global one, by using different starting values.
And always test for convergence of your chain.