'mode_check' and 'matrix must be positive definite' problem

ajpirzada · 4 July 2013 13:22

In one the threads, i read StephaneAdjemian pointing out regarding the ‘mode_check’ plots, “You have a problem when the vertical line (the posterior mode obtained from the optimization routine) does not coincide with the maximum of the blue curve.” Elsewhere, MichelJuillard said, “theoreticaly a variance can’t be negative and the Hessian matrix is definite positive, but, if the optimization isn’t successful and stops away from the mode, the Hessian matrix won’t be definite positive. It is possible to apply a correction to make it definite positive, but, in that case, it is crucial to know the importance of the correction.”

I am having a similar problem in my mode_check plots where the vertical line is not crossing through the maximum of the blue curve in many cases. In some other cases, I have the blue curve as a straight line thus not having either maximum or minimum. what would this mean?

Can someone please have a look at my code (and plots) and guide me on how to correct this problem and what might be causing it?
Also, what does the difference between the blue and the green curve indicate? I understand from the discussions on this forum that a flat green line indicates no information in the data regarding the concerned parameter. However, if the two curves do not even roughly overlap each other, what would this indicate? that the prior information is different from the information in the data? Is there any other or more appropriate interpretation?

note that the mode_check plots in the zip file are when i run the code using mode_compute=6. I get the problem of matrix not positive definite when i use any other optimization routine (eg. 1, 3 or 4).

The zip file includes the mode_check plots, the result’s file (.log) and the code along with the data.

I will be grateful for your help. Thanks!
usmodeladj.zip (237 KB)

jpfeifer · 4 July 2013 16:04

Finding the mode can be really hard. In your picture, the mode value is supposed to be about -1500 (compared to -2400). mode_compute=9 should be the most powerful mode-finder. It delivers a value of about -750.

The bigger issue seems to be model-misspecification. Using the identification command shows that some parameters are not identified:

[quote]The steadystate file changed the values for the following parameters:
constebeta
ccs
cinvs
crdpi
The derivatives of jacobian and steady-state will be computed numerically
(re-set options_.analytic_derivation_mode= -2)

==== Identification analysis ====

Testing prior mean

WARNING !!!
The rank of H (model) is deficient!

constepinf is not identified in the model!
[dJ/d(constepinf)=0 for all tau elements in the model solution!]
constebeta is not identified in the model!
[dJ/d(constebeta)=0 for all tau elements in the model solution!]
ctrend is not identified in the model!
[dJ/d(ctrend)=0 for all tau elements in the model solution!]

WARNING !!!
The rank of J (moments) is deficient!

constepinf is not identified by J moments!
[dJ/d(constepinf)=0 for all J moments!]
constebeta is not identified by J moments!
[dJ/d(constebeta)=0 for all J moments!]
ctrend is not identified by J moments!
[dJ/d(ctrend)=0 for all J moments!]

[cmap,crhopinf] are PAIRWISE collinear (with tol = 1.e-10) !
[cmaw,crhow] are PAIRWISE collinear (with tol = 1.e-10) ![/quote]

This would give you a horizontal blue line for the posterior mode.

ajpirzada · 5 July 2013 17:14

Thanks jpfeifer for such a quick reply. I have corrected some of the misspecifications and am now getting the following result for the identification:

[quote]==== Identification analysis ====

Testing prior mean

All parameters are identified in the model (rank of H).

WARNING !!!
The rank of J (moments) is deficient!

[cmap,crhopinf] are PAIRWISE collinear (with tol = 1.e-10) !
[cmaw,crhow] are PAIRWISE collinear (with tol = 1.e-10) !

Monte Carlo Testing

Testing MC sample

All parameters are identified in the model (rank of H).

All parameters are identified by J moments (rank of J)

==== Identification analysis completed ====[/quote]

When i use the mode_compute=9 optimization routine, i get better plots for the mode_check command with the vertical line intersecting the maximum of the log-post curves. However, it crashes after the first set of estimations giving the error of ‘matrix must be positive definite.’ Is this error due to the ‘pairwise’ collinearity between the parameters of the price and wage shock exogenous processes - as suggested by the identification test? and how do i solve this problem?

Also, when i got the plots using mode_compute=1, there was no log-post and log-lik kernal curve for some of the parameters. what was the reason for this? this is despite the log-likelihood being closer (-950) to what i get when using mode_compute=9.

Thanks!
usmodeladj.zip (170 KB)

jpfeifer · 5 July 2013 22:28

In future versions, you will get the warning message:

[quote]POSTERIOR KERNEL OPTIMIZATION PROBLEM!
(minus) the hessian matrix at the “mode” is not positive definite!
=> posterior variance of the estimated parameters are not positive.
You should try to change the initial values of the parameters using
the estimated_params_init block, or use another optimization routine.

The following parameters are at the prior bound: SE_ez, SE_eb, SE_eg, SE_eqs, SE_em, SE_epinf, SE_ew, crhoz, crhob, crhog, crhoqs
Some potential solutions are:

Check your model for mistakes.
Check whether model and data are consistent (correct observation equation).
Shut off prior_trunc.
Use a different mode_compute like 6 or 9.
Check whether the parameters estimated are identified.
Increase the informativeness of the prior.[/quote]

Those parameters being at the bound also explains why there are no kernels being displayed. For future versions, this should be fixed soon.

ajpirzada · 8 July 2013 15:57

it seems that my Hessian matrix is non-invertible because of the boundary constraints etc. and i will have to significantly respecify the model to get around this problem.
However, i was reading about the possibility of using ‘generalized inverse’ or ‘generalized Cholesky’ or other procedures to get around this problem without sacrificing the model specifications. is there anyway of doing this in Dynare? or is it still to be incorporated?

this is the piece of literature i was reading concerning this issue and it has considerable detail regarding the issue of non-invertability of the Hessian matrix and possible ways of dealing with it: gking.harvard.edu/files/gking/files/numhess.pdf

Thanks for your time. much appreciated!

jpfeifer · 8 July 2013 19:36

Something similar used to be implemented in Dynare. But people used it as a cheap way to get a positive definite Hessian instead of fixing their model. As far as I can see, there is no good economic reason why your model should hit so many boundary conditions at the same time. Thus, I would recommend looking more at the model in detail than thinking that this is just a numerical problem you could solve with a different way to compute the Hessian. My guess us that there is something systematic still wrong.

For example, why do you use a strict lower bound for the standard deviation of shocks of 1%? This is considerably higher than typical estimates.

ajpirzada · 9 July 2013 11:44

Thanks for pointing it out.

Regards.

ajpirzada · 25 July 2013 13:42

I have adjusted the priors to stop the model from hitting the boundary constraints in the attached mod file. the estimations run smoothly and the mode_check and convergence plots seem to be consistent. Thanks for this help.

Then i tried to change various prior settings to see what happens and im getting apparently inconsistent results. let me explain:

Firstly, i ran this exact .mod file for various mode_compute routines. incredibly i get almost identical parameter estimates across various optimization routines.
Secondly, i increase the sample size (change first_obs=147 to first_obs=71). this is when everything goes wrong: 1) the model hits the same problem as before - non positive definite matrix. this is somehow understandable because for each time period parameters are likely to have different values. finding it hard to readjust the upper and lower boundaries, I get rid of them and leave only the initial values, distribution function, prior mean and prior variance. The model worked; 2) I repeat step one but this time parameter estimates end up being very different across optimization routines. for example, elasticity of capital adjustment cost (csadjcost) parameter changes from 1.75 (under mode_compute=1) to 0.23 (under mode_compute=4) to 0.65 (under mode_compute=9). similarly, parameter ‘csigma’ changes from -0,04 to 2.02 to 1.63, following the same order. while i can comprehend some difference in the posterior results across different routines, i cannot understand why is the change so big for some parameter estimates? (it is interesting to see that csigma is lower when csadjcost is higher and vice versa).
Thirdly, i bring back the upper and lower boundaries (with some adjustment) to the second step. so far i can only make it work for optimization routine 1, and 9. now comparing few parameter estimates with those under Second step for mode_compute=1, csadjcost is now 0.20 while it was 1.75 before; csigma is now 0.23 while it was -0.04 before. other parameters change as well but not considerably. now even here posterior estimates for some parameters are different across routine 1 and 9 - similar to the second step. for example, csima changes from 0.23 (under routine 1) to 1.99 (under routine 9); and, constelab changes from -4.00 to -2.52, following the same order.
Forth, i change the sample size and everything goes wrong again. Then i have to repeat the whole process of adjusting the boundaries etc.

note also, i cannot make it any of the model settings work under mode_compute=3.

my questions are:

now i do not know what to make out of it. should a model be working under all optimization routines? can a model be this sensitive to priors? and should changing of the sample size really be this problematic? and how to decide which results to choose for the final paper? I remember working on the Smets and Wouters (2007) code and it seemed very stable to such changes. i didnt have to readjust priors for different optimization routines or sample sizes. This model is a the same except that this also has stochastic detrending rather than deterministic detrending only - based on section 2.1 of NEGRO & SCHORFHEIDE (2012).

i will be grateful for ur answer. this will help me with many of my confusions which have stalled my work.
usmodeladj11.mod (19.6 KB)

jpfeifer · 25 July 2013 15:45

The Smets/Wouters model is also not without problems, see _Chib/Ramamurthy (2010): Tailored randomized block MCMC methods with application to DSGE models (ideas.repec.org/a/eee/econom/v155y2010i1p19-38.html). The more data you have, the more complicated is becomes to find the mode. Different optimizers will produce different results when applied to nonlinear, global problems. For example, Newton type optimizers will search for a local maximum of the posterior likelihood while CMAES (9) will search for a global one. Finding the mode is a real pain. Fortunately, the posterior likelihood provides a measure of goodness of fit. The lower minus the posterior likelihood, the better.

ajpirzada · 25 July 2013 18:40

thanks a lot. i need to get myself used to the language of these papers directly dealing with algorithmic issues.

i stumbled at the appendix to Liu, Wang & Zha (2013) paper on land price dynamics. in it they have summarized my problem more precisely as their model seem to have behaved in a similar way when running the estimations in dynare:

“In our model with credit constraints, we ﬁnd that the posterior kernel is full of thin and winding ridges as well as local peaks. Finding the mode of the posterior distribution has proven a difﬁcult task.
To see how such difﬁculty arises, we ﬁrst use Dynare 4.2 to estimate our model. We choose many sets of reasonably calibrated parameters as different starting points, and the Dynare program has difﬁculty to converge. For quasiNewton based optimization methods (e.g., options mode_compute=1 to 5 in Dynare), we encounter the message “POSTERIOR KERNEL OPTIMIZATION PROBLEM! (minus) the Hessian matrix at the ‘mode’ is not positive deﬁnite!,” meaning that the results are unreliable. One method (with the option mode_compute=6 in Dynare), which triggers a Monte Carlo based optimization routine, is very inefﬁcient and seems to be able to converge to a local peak only.” then they go on to explain their results under different optimization routines. under all of the optimizations in dynare, they find that 'the program converges with ill-behaved Hessian matrix. Similarly they find that the solver under mode_compute=3 cannot converge and stops prematurely.

To deal with it, they have developed their own algrothim which apparently takes days to solve the model (not very convenient). They go on to say, “Once we complete the posterior mode estimation using our own program, we use the estimated results as a starting point for the Dynare optimization routine. The Dynare program converges instantly. We are currently working with Dynare to use their preprocessor and compile part of our C/C++ code into Dynare so that the general user will be able to use our estimation procedure.”

The purpose of quoting all this was to ask you regarding their last statement about them working with the Dynare team to incorporate their estimation software in the Dynare package. is it still an abstract idea or some tangible development could be expected in the next launch?

Great Regards!

link to the appendix: econometricsociety.org/ecta/ … aneous.pdf

jpfeifer · 26 July 2013 08:18

Any improvement on that front is still several months (>6) away. We are also working on adding the Chib/Ramamurthy algorithm, but these things are complex and need a lot of testing. If your model is correct (the most common reason for the numerical problems), your best shot seems to be a combination of different mode-finders. You can load previous runs with the mode_file option and then continue from there using a different mode-finder. Take the posterior likelihood as a guidance. Iterate until you don’t get changes in the posterior anymore. Note also that non-identifiability and parameters at a bound trivially result in non-positive definite Hessians and are different from standard convergence/mode-finding problems.

yc1122 · 1 October 2013 07:49

Hi all,

I find the discussions above very useful. Thanks a lot for that.

But I still have a question about the posterior mode. Since the state-space form derived from the model and data can be multimodal (say, even with hundreds of peaks), even if one can find out the “mode” using any optimization solver, it’s not necessarily the true one. And if we can, why not simply abandon Bayesian approach if the frequentist method delivers reasonable results? So my question is why one should care so much if we can find the “mode” in a Bayesian framework in which an “average” behaviour (mean/median of the posterior distribution of structural parameters) is a bigger concern?

As I understand it, in Schorfheide’s algorithm the start-up of posterior mode search is for initializing the MCMC step in a (relatively) high probability region to facilitate convergence and put estimates in more reasonable ranges with the help of our “reliable” knowledge (priors). What we do care about is the posterior distributions of the parameters, isn’t it?

Please correct me if I misunderstand anything. Thank you very much.

jpfeifer · 14 October 2013 09:30

What exactly is your point? The mode you are looking for corresponds to the maximum likelihood estimate (if we forget about the prior for a moment). The issue typically is that the likelihood function is not well behaved and finding a maximum is hard, regardless of whether you use Bayesian or frequentist techniques (one advantage of Bayesian techniques is that mulitplying the prior to the likelihood often results in a better behaved function to optimize.)

Regarding initialization: for any positive definite Hessian you will visit all regions of the posterior parameter space. It just might take infinite time. Thus, efficiency quickly becomes a concern.

yc1122 · 14 October 2013 11:20

Thank you very much for your reply. Actually, you’ve answered my question in the second part about initialization. I was asking about why we always want to start with the posterior mode, if any, instead of any region even close to be a mode (say, when we can’t find one using any solver) that seems already a good head start . And I do learn from the practice that efficiency is such a big concern!

I’m estimating a DSGE model with 60+ equations and 32 parameters, however, somehow I can’t find the posterior mode (combined different solvers in dynare, only result in singular covariance matrix). So I have to start with mode_compute=6 which delivers a log posterior likelihood of -4500 (not a bad one though, compared to the best I can find -3600 combined solvers repeatedly). However, the chain has run for days (4 million draws for now) and only half of the parameters reach convergence. In particular, two of them seem still a bit away from the ergodic mean.

Is it normal to run millions of draws to reach convergence? Or is that a bad signal indicating some potential conceptual problem in the model? I’m not well experienced in this field. Any ideas are gratefully appreciated.

jpfeifer · 14 October 2013 13:50

Poor convergence is a well-known problem and is common. The learn more about it, see for example the paper by Chib/Ramamurthy ideas.repec.org/a/eee/econom/v155y2010i1p19-38.html

yc1122 · 14 October 2013 14:27

I know that paper, just didn’t have time to read through it. But many thanks anyway for your kind help.

Btw, will the next version of Dynare include the mode solver by Zheng Liu, Pengfei Wang and Tao Zha (“Land-price dynamics and macroeconomic fluctuations”, Econometrica, 2013) and Chib-Ramamurthy algorithm?