Hessian computation

reubenpjacob · July 11, 2017, 10:07am

Hi All
A common problem that plagues the estimation of DSGE models is the computation of the inverse of the Hessian at the posterior mode, that is not positive definite.
I am aware that the option mode_compute=5 offers a different way of computing the Hessian. optim={‘Hessian’,2}. This seems to work well, from what I have seen. There is very little documentation available on this in the manual.
Do other optimisers also offer similar options? Perhaps not documented?
Cheers
Reuben

stepan-a · July 12, 2017, 7:01am

Dear Reuben,

I believe the answer is no. However, mode_compute=6, which is based on a MCMC, provides an estimate of the posterior covariance matrix not based on the inverse of the hessian matrix (we use MCMC draws, so it works as long as the acceptance ratio is not too close to zero). Also, it is not mandatory to estimate the posterior mode (with hessian at the estimated mode) for running a metropolis. You can optionally use another covariance matrix for the jumping distribution.

Best,
Stéphane.

jpfeifer · July 12, 2017, 9:54am

mode_compute=5 with optim={'Hessian',2} relies on the outer product of gradients and requires the use of a univariate Kalman filter. Therefore, something similar is not available for other optimizers. As Stéphane indicated, mode_compute=6 also differs.
The reason for using the inverse Hessian at the mode is that this is the most efficient choice if the posterior would be normal. But any positive definite matrix for the proposal density will do. You could provide any arbitrary matrix via the

command.

Aldo · July 13, 2017, 6:14am

Sorry for my question,
Stéphane indicated about mode_compute=6,

From what stage of this method are you referring? I ask this because I see severals acceptance rates in different stages of procedure of this method.

¿Is it not assured that acceptance ratio = 1/3?
I ask this because I see in manual ’

stepan-a · July 13, 2017, 7:11am

We use the draws of the metropolis (run in mode_compute=6) to estimate the posterior covariance matrix. We need to have enough variability in these draws. If all the draws, were identical (or more generally if the number of different draws was smaller than the number of estimated parameters) the sample covariance matrix would obviously not be full rank. That’s why we target, by default, an acceptance rate of one third (which is the value commonly considered in the literature.

The optimization routine has several steps:
[ul]

We run a metropolis, where we tune the scale parameter so that the acceptance rate is close to the target. In this first step the draws are not used to estimate the covariance matrix, we only continuously keep record of the vector of parameters which gives the best value for the objective. In this first step the covariance matrix if the jumping distribution is an identity matrix.
We run a metropolis to estimate the covariance matrix. Again we continuously keep record of the vector of parameters which gives the best value for the objective.
We return to 1. with the new estimate of the posterior covariance matrix.

[/ul]
By default we iterate on these steps two or three times (I do not remember, look at the reference manual). And in the last step we run a last metropolis, where we only update our estimate of the posterior mode. In this last round we decrease slowly the size of the jumps.

Best,
Stéphane.

reubenpjacob · July 15, 2017, 9:25pm

This discussion was very useful. Thanks a lot!
When we use mode_compute=6, sequentially after using a ‘more efficient’ optimiser as #4 or #8, would the MCMC be initialised at the latest available mode file or would it still use the values specificied in the estimated_params_init block?
Thanks again!

stepan-a · July 15, 2017, 9:38pm

Dear Reuben,

The initial state of the MCMC will be (or centered around, if you have more than one chain) the posterior mode estimate returned by the last optimization routine.

Best,
Stéphane.

jpfeifer · July 16, 2017, 9:01am

That depends on what exactly you are doing. Stephane is right when you use the

option. It will start at the last found mode.

reubenpjacob · July 17, 2017, 7:35pm

Thanks Johannes.
Do you recommend any readings to understand why the Hessian found by the numerical optimiser(s) is not positive definite?
Since this seems to be a very common problem, I thought a few readings would be useful!
Thanks
Reuben

stepan-a · July 18, 2017, 6:47am

Dear Reuben,

In Dynare we do not use the hessian matrices returned by the optimizers. For instance, mode_compute=4, the default algorithm derived from Chris Sims code, returns a crude estimate of the hessian that is not used by Dynare (see any presentation of the BFGS, there is a nice page on wikipedia, algorithm which is close to what is done here). Instead we compute the hessian with finite differences (except mode_compute=6, and mode_compute=5 which uses a gradient the outer product approach), by calling hessian.m function.

The main culprit is that the optimization routine failed in finding a (local) minimum of minus the likelihood (or posterior kernel). That’s why you need to play with other optimization routines and/or the initial guesses. Another culprit, may be the noise in the objective function. In this case you have to change the length of the steps in the finite difference routine (controlled by options_.gstep).

Best,
Stéphane.

jpfeifer · July 18, 2017, 10:02am

Two additional issues are:

the Hessian only needs to be positive definite at an interior solution. If you have a corner solution, i.e you are at the bound of your prior parameter space, there will be a problem
if a parameter is not identified or if there is collinearity in the Jacobian of the likelihood, the Hessian will also be non-positive definite

That is why a look at the mode_check plots is often revealing to see many pathological issues not simply due to the finite difference approximation to the Hessian.

tim_r · March 13, 2019, 4:02am

Hi
How does the univariate kalman filter (called by mode_compute=5 with `optim={‘Hessian’,2}) handle cases where shocks in the measurement and state equations are correlated?

This situation arises, for example, if there is a permanent technology shock in the model. Assuming no autocorrelation in technology growth then its innovation will be in the measurement and state equations.

Cheers
tim