Strange interaction b/w mode_compute = 4 and identification?

Dear all,

It seems that I have experienced a strange interaction between the command *identification *and the ability of the optimizer csminwel (mode_compute = 4) to find the posterior mode.

If I estimate 3 distinct AR(1) processes (see the attached mod-file and data file) **with **the *identification *command, *mode_compute = 4 *leads to a positive definite hessian. If I run it **without **the *identification *command, *mode_compute = 4 *does not deliver the maximum, and the hessian is **not **positive definite. Also, the values of the posterior are slightly different. The log data density is higher (!) for the case with the not positive definite hessian. And this is the case although the plots following mode_check all look fine (see the attached plots 1 and 2)!

Does anybody has an explanation for this observation?

I am thankful for any help,
all the best,
Niklas
plot_2.pdf (15.3 KB)
plot_1.pdf (11.7 KB)
data_mode_identification_interaction.xls (24.5 KB)
mode_identification_interaction.mod (2.9 KB)

You are running identification for the prior. Thus, after running identification, the parameter values are set to the prior mean. In contrast, when running estimation directly, the parameters are initialized to the specified starting values. This difference in starting values explains the different outcome when running a numerical optimizer.

Dear jpfeifer,

as always, thank you very much for your help!

I am not quite sure whether I understood your answer correctly. Does you answer imply that running the model without specifying any starting values (hence starting from the prior mean by default) should lead to the same outcome as having the identification command before running mode_compute = 4? This is however not the case (as you can see below where I have added some extractions from the Matlab output): The initial values of the log posterior are quite different and the number of iterations in both cases are also quite different. More importantly, in the case without identification, I get the error message of “not positive definite hessian”.

What I also really do not understand is, why csminwel (and also mode_compute = 5, I checked it) is having such a hard time in finding the posterior mode. The problem is very simple (3 AR(1) processes without any connection), so it should never lead to the problem of having a “not positive hessian”!? Do you have an explanation for this?

All the best!
Niklas

Run **without ** the identification command, but with starting values at the prior mean:

Initial value of the log posterior (or likelihood): 567.0646
-----------------
-----------------
f at the beginning of new iteration,      -567.0645821956
Predicted improvement:     9394.239308087
lambda =          1; f =         -565.2445182
lambda =    0.33333; f =         -566.8751041
lambda =    0.11111; f =         -567.0475026
lambda =   0.037037; f =         -567.0637321
lambda =   0.012346; f =         -919.0702508
lambda =   0.023866; f =         -567.0644589
lambda =    0.01607; f =         -567.0645819
lambda =   0.012676; f =         -932.4183817
lambda =   0.014615; f =         -988.3939319
lambda =   0.016851; f =         -567.0645799
lambda =   0.015472; f =         -948.2675865
lambda =   0.016285; f =         -567.0645816
lambda =   0.015792; f =         -567.0645822
lambda =   0.015314; f =         -964.3574522
lambda =   0.015599; f =         -567.0645822
lambda =   0.015427; f =         -953.3961886
Norm of dx     1.3707
Cliff.  Perturbing search direction.
Predicted improvement:    14893.717837427
lambda =          1; f =         -562.2121069
lambda =    0.33333; f =         -566.5458539
lambda =    0.11111; f =         -567.0134807
lambda =   0.037037; f =         -567.0608057
lambda =   0.012346; f =         -567.0645224
lambda =  0.0041152; f =         -718.5544360
lambda =  0.0079555; f =         -921.7019034
lambda =   0.015379; f =         -567.0643882
lambda =   0.010356; f =         -567.0645680
lambda =   0.008168; f =         -567.0645822
lambda =  0.0080187; f =         -921.1032345
lambda =   0.008108; f =         -918.2936891
lambda =  0.0081982; f =         -567.0645822
Norm of dx     2.2239
Cliff again.  Try traversing
Predicted improvement: 102642132.493013650
lambda =          1; f =    203262438.5730386
lambda =    0.33333; f =     22584121.2386550
lambda =    0.11111; f =      2508812.8774944
lambda =   0.037037; f =       278242.9741065
lambda =   0.012346; f =        30408.5104982
lambda =  0.0041152; f =         2873.5601311
lambda =  0.0013717; f =         -185.1413799
lambda = 0.00045725; f =         -524.7512971
lambda = 0.00015242; f =         -562.4037969
lambda = 5.0805e-05; f =         -566.5600962
lambda = 1.6935e-05; f =         -567.0128166
lambda =  5.645e-06; f =         -567.0601198
lambda = 1.8817e-06; f =         -567.0643905
lambda = 6.2723e-07; f =         -670.9653593
lambda = 1.2125e-06; f =         -567.0645618
lambda = 8.1646e-07; f =         -710.3305928
lambda = 1.0351e-06; f =         -567.0645779
lambda = 8.9775e-07; f =         -567.0645820
lambda = 8.2424e-07; f =         -710.4171907
lambda = 8.6759e-07; f =         -567.0645822
lambda = 8.4132e-07; f =         -708.8439203
lambda = 8.5698e-07; f =         -704.1474898
lambda = 8.7295e-07; f =         -567.0645822
lambda = 8.6333e-07; f =         -567.0645822
lambda = 8.5383e-07; f =         -705.4318732
Norm of dx      14328
----
Improvement on iteration 1 =      421.329349745
back and forth on step length never finished
-----------------
-----------------
f at the beginning of new iteration,      -988.3939319404
Predicted improvement:    19510.472534793
lambda =          1; f =         -986.6782540
lambda =    0.33333; f =         -988.2038025
lambda =    0.11111; f =         -988.3729731
lambda =   0.037037; f =         -988.3916582
lambda =   0.012346; f =         -988.3936971
lambda =  0.0041152; f =         -988.3939112
lambda =  0.0013717; f =         -988.3939309
lambda = 0.00045725; f =         -991.6241260
Norm of dx     1.9754
----
Improvement on iteration 2 =        3.230194084


... (I deleted the iterations 3-30)


Improvement on iteration 31 =        0.000000247
back and forth on step length never finished
-----------------
-----------------
f at the beginning of new iteration,     -1023.4940295049
Predicted improvement:        0.000010945
lambda =          1; f =        -1023.4940195
lambda =    0.33333; f =        -1023.4940286
lambda =    0.11111; f =        -1023.4940295
lambda =   0.037037; f =        -1023.4940295
Norm of dx 8.0884e-05
----
Improvement on iteration 32 =        0.000000022
improvement < crit termination
Objective function at mode: -1023.494030
 
POSTERIOR KERNEL OPTIMIZATION PROBLEM!
 (minus) the hessian matrix at the "mode" is not positive definite!
=> posterior variance of the estimated parameters are not positive.
You should  try  to change the initial values of the parameters using
the estimated_params_init block, or use another optimization routine.
Warning: The results below are most likely wrong! 
> In dynare_estimation_1 at 480
  In dynare_estimation at 70
  In mode_identification_interaction at 228
  In dynare at 120 
 
MODE CHECK[/code]


Run with speficied starting values (actually the mode itself) and **with **the *identification *command in the beginning (i.e. at the prior mean):

[code]Initial value of the log posterior (or likelihood): 1023.4939
-----------------
-----------------
f at the beginning of new iteration,     -1023.4939473386
Predicted improvement:        0.391220249
lambda =          1; f =        -1023.4939470
lambda =    0.33333; f =        -1023.4939473
lambda =    0.11111; f =        -1013.0549904
lambda =   0.037037; f =        -1021.8991452
lambda =   0.012346; f =        -1023.2984405
lambda =  0.0041152; f =        -1023.4733102
lambda =  0.0013717; f =        -1023.4923285
lambda = 0.00045725; f =        -1023.4940044
lambda = 0.00015242; f =        -1023.4940331
lambda = 0.00029465; f =        -1023.4940529
Norm of dx  0.0088456
----
Improvement on iteration 1 =        0.000105571
-----------------
-----------------
f at the beginning of new iteration,     -1023.4940529095
Predicted improvement:        0.011833352
lambda =          1; f =        -1023.4940528
lambda =    0.33333; f =        -1018.6489259
lambda =    0.11111; f =        -1022.9332580
lambda =   0.037037; f =        -1023.4306419
lambda =   0.012346; f =        -1023.4871290
lambda =  0.0041152; f =        -1023.4933457
lambda =  0.0013717; f =        -1023.4939959
lambda = 0.00045725; f =        -1023.4940538
lambda = 0.00015242; f =        -1023.4940554
Norm of dx  0.0015614
----
Improvement on iteration 2 =        0.000002502
-----------------
-----------------
f at the beginning of new iteration,     -1023.4940554110
Predicted improvement:        0.000897340
lambda =          1; f =        -1023.4940554
lambda =    0.33333; f =        -1022.8271337
lambda =    0.11111; f =        -1023.4258643
lambda =   0.037037; f =        -1023.4867148
lambda =   0.012346; f =        -1023.4932614
lambda =  0.0041152; f =        -1023.4939724
lambda =  0.0013717; f =        -1023.4940478
lambda = 0.00045725; f =        -1023.4940551
lambda = 0.00015242; f =        -1023.4940556
Norm of dx 0.00043959
----
Improvement on iteration 3 =        0.000000150
-----------------
-----------------
f at the beginning of new iteration,     -1023.4940555607
Predicted improvement:        0.000002508
lambda =          1; f =        -1023.4902628
lambda =    0.33333; f =        -1023.4936388
lambda =    0.11111; f =        -1023.4940098
lambda =   0.037037; f =        -1023.4940506
lambda =   0.012346; f =        -1023.4940551
lambda =  0.0041152; f =        -1023.4940555
lambda =  0.0013717; f =        -1023.4940556
lambda = 0.00045725; f =        -1023.4940556
Norm of dx 1.6316e-05
----
Improvement on iteration 4 =        0.000000002
improvement < crit termination
Objective function at mode: -1023.494056
 
MODE CHECK
 
Fval obtained by the minimization routine: -1023.494056
 

From the initial value at posterior, you can see that the algorithm started from different values. I don’t really know why derivative-based solvers have such problems, apart from the likelihood obviously having local maxima. My conjecture is that the problem has to do with a finite sample bias in AR-estimation if the process is persistent (described in some Econometrica paper in the 1980s I think).

Dear jpfeifer,

again, thank you very much!

I still don’t know whether I got you right. My feeling is that specifying starting values or not does not matter here. In the example below I have exactly the same starting value for both cases (at the prior mean). In both cases, I arrive at a posterior mode which is approximately the same (up to the fourth digit). However, in the case where I run the *identification *command before, the hessian is positive definite. In the case where I do **not **run the *identification *command, the hessian is not positive definite.
It is as if running the *identification *command improves the outcome of csminwel. Do you have an idea why this might be the case?

One side question, how do you know/see that there are local maxima…?

All the best,
Niklas

Run **with ***identification *command:

==== Identification analysis ====
 
Testing prior mean
  
All parameters are identified in the model (rank of H).
 
 
All parameters are identified by J moments (rank of J)
 
 
==== Identification analysis completed ====
 
 
Loading 55 observations from data_mode_identification_interaction.xls

Initial value of the log posterior (or likelihood): 567.0646
-----------------
-----------------
f at the beginning of new iteration,      -567.0645821956
Predicted improvement:     9395.110504760
lambda =          1; f =         -565.2443467
lambda =    0.33333; f =         -566.8750856
lambda =    0.11111; f =         -567.0475008
lambda =   0.037037; f =         -567.0637320
lambda =   0.012346; f =         -919.0936673
lambda =   0.023866; f =         -567.0644588
lambda =    0.01607; f =         -567.0645819
lambda =   0.012676; f =         -932.4421725
lambda =   0.014615; f =         -988.3942773
lambda =   0.016851; f =         -567.0645799
lambda =   0.015472; f =         -948.1812928
lambda =   0.016285; f =         -567.0645816
lambda =   0.015792; f =         -567.0645822
lambda =   0.015314; f =         -964.2985128
lambda =   0.015599; f =         -567.0645822
lambda =   0.015427; f =         -953.3184351
Norm of dx     1.3708
Cliff.  Perturbing search direction.
Predicted improvement:    13747.664098287
lambda =          1; f =         -563.0339956
lambda =    0.33333; f =         -566.6355882
lambda =    0.11111; f =         -567.0229223
lambda =   0.037037; f =         -567.0616785
lambda =   0.012346; f =         -567.0645580
lambda =  0.0041152; f =         -703.1734998
lambda =  0.0079555; f =         -886.2257451
lambda =   0.015379; f =         -567.0644745
lambda =   0.010356; f =         -567.0645784
lambda =   0.008168; f =         -892.9637700
lambda =  0.0094179; f =         -567.0645813
lambda =  0.0086467; f =         -567.0645822
lambda =  0.0082147; f =         -893.9354671
lambda =  0.0084712; f =         -893.5366052
lambda =  0.0087358; f =         -567.0645821
lambda =  0.0085761; f =         -567.0645822
lambda =  0.0084193; f =         -894.6499136
lambda =   0.008513; f =         -892.1084386
lambda =  0.0086078; f =         -567.0645822
Norm of dx      2.029
Cliff again.  Try traversing
Predicted improvement: 265890718.828363960
lambda =          1; f =    521488220.8781180
lambda =    0.33333; f =     57942447.4042859
lambda =    0.11111; f =      6437484.2678855
lambda =   0.037037; f =       714751.5104909
lambda =   0.012346; f =        78905.9567007
lambda =  0.0041152; f =         8260.9978827
lambda =  0.0013717; f =          413.0737335
lambda = 0.00045725; f =         -458.4125987
lambda = 0.00015242; f =         -555.0760002
lambda = 5.0805e-05; f =         -565.7602403
lambda = 1.6935e-05; f =         -566.9286655
lambda =  5.645e-06; f =         -567.0522530
lambda = 1.8817e-06; f =         -567.0639071
lambda = 6.2723e-07; f =         -775.7610827
Norm of dx      23060
----
Improvement on iteration 1 =      421.329695144
back and forth on step length never finished
-----------------
-----------------
f at the beginning of new iteration,      -988.3942773400
Predicted improvement:    19548.357215532
lambda =          1; f =         -986.6880608
lambda =    0.33333; f =         -988.2051971
lambda =    0.11111; f =         -988.3734343
lambda =   0.037037; f =         -988.3920162
lambda =   0.012346; f =         -988.3940438
lambda =  0.0041152; f =         -988.3942568
lambda =  0.0013717; f =         -988.3942763
lambda = 0.00045725; f =         -991.6319344
Norm of dx     1.9773
----
Improvement on iteration 2 =        3.237657047

…

Improvement on iteration 30 =        0.000000335
-----------------
-----------------
f at the beginning of new iteration,     -1023.4940568420
Predicted improvement:        0.000000002
lambda =          1; f =        -1023.4940568
Norm of dx 1.7586e-06
----
Improvement on iteration 31 =        0.000000002
improvement < crit termination
Objective function at mode: -1023.494057
 
MODE CHECK
 
Fval obtained by the minimization routine: -1023.494057
 
 
RESULTS FROM POSTERIOR MAXIMIZATION
parameters
        prior mean     mode    s.d. t-stat prior pstdev

rho_einG_a   0.750   0.7958  0.0719 11.0761 beta  0.1000
rho_einG_b   0.750   0.8473  0.0636 13.3122 beta  0.1000
rho_etauw_a   0.750   0.9478  0.0208 45.5896 beta  0.1000
rho_etauw_b   0.750   0.9016  0.0378 23.8494 beta  0.1000
rho_eg_a    0.750   0.7940  0.0723 10.9888 beta  0.1000
rho_eg_b    0.750   0.9140  0.0400 22.8438 beta  0.1000
standard deviation of shocks
        prior mean     mode    s.d. t-stat prior pstdev

nua_einG    0.010   0.0019  0.0002 10.7694 invg  2.0000
nub_einG    0.010   0.0017  0.0002 10.7511 invg  2.0000
nua_etauw   0.010   0.0017  0.0002 10.4729 invg  2.0000
nub_etauw   0.010   0.0013  0.0001 10.5941 invg  2.0000
nua_ecG     0.010   0.0020  0.0002 10.7690 invg  2.0000
nub_ecG     0.010   0.0013  0.0001 10.6270 invg  2.0000
 
Log data density [Laplace approximation] is 963.396285.
 
Total computing time : 0h00m18s

Run **without ***identification *command:

Loading 55 observations from data_mode_identification_interaction.xls

Initial value of the log posterior (or likelihood): 567.0646
-----------------
-----------------
f at the beginning of new iteration,      -567.0645821956
Predicted improvement:     9394.239308087
lambda =          1; f =         -565.2445182
lambda =    0.33333; f =         -566.8751041
lambda =    0.11111; f =         -567.0475026
lambda =   0.037037; f =         -567.0637321
lambda =   0.012346; f =         -919.0702508
lambda =   0.023866; f =         -567.0644589
lambda =    0.01607; f =         -567.0645819
lambda =   0.012676; f =         -932.4183817
lambda =   0.014615; f =         -988.3939319
lambda =   0.016851; f =         -567.0645799
lambda =   0.015472; f =         -948.2675865
lambda =   0.016285; f =         -567.0645816
lambda =   0.015792; f =         -567.0645822
lambda =   0.015314; f =         -964.3574522
lambda =   0.015599; f =         -567.0645822
lambda =   0.015427; f =         -953.3961886
Norm of dx     1.3707
Cliff.  Perturbing search direction.
Predicted improvement:    14893.717837427
lambda =          1; f =         -562.2121069
lambda =    0.33333; f =         -566.5458539
lambda =    0.11111; f =         -567.0134807
lambda =   0.037037; f =         -567.0608057
lambda =   0.012346; f =         -567.0645224
lambda =  0.0041152; f =         -718.5544360
lambda =  0.0079555; f =         -921.7019034
lambda =   0.015379; f =         -567.0643882
lambda =   0.010356; f =         -567.0645680
lambda =   0.008168; f =         -567.0645822
lambda =  0.0080187; f =         -921.1032345
lambda =   0.008108; f =         -918.2936891
lambda =  0.0081982; f =         -567.0645822
Norm of dx     2.2239
Cliff again.  Try traversing
Predicted improvement: 102642132.493013650
lambda =          1; f =    203262438.5730386
lambda =    0.33333; f =     22584121.2386550
lambda =    0.11111; f =      2508812.8774944
lambda =   0.037037; f =       278242.9741065
lambda =   0.012346; f =        30408.5104982
lambda =  0.0041152; f =         2873.5601311
lambda =  0.0013717; f =         -185.1413799
lambda = 0.00045725; f =         -524.7512971
lambda = 0.00015242; f =         -562.4037969
lambda = 5.0805e-05; f =         -566.5600962
lambda = 1.6935e-05; f =         -567.0128166
lambda =  5.645e-06; f =         -567.0601198
lambda = 1.8817e-06; f =         -567.0643905
lambda = 6.2723e-07; f =         -670.9653593
lambda = 1.2125e-06; f =         -567.0645618
lambda = 8.1646e-07; f =         -710.3305928
lambda = 1.0351e-06; f =         -567.0645779
lambda = 8.9775e-07; f =         -567.0645820
lambda = 8.2424e-07; f =         -710.4171907
lambda = 8.6759e-07; f =         -567.0645822
lambda = 8.4132e-07; f =         -708.8439203
lambda = 8.5698e-07; f =         -704.1474898
lambda = 8.7295e-07; f =         -567.0645822
lambda = 8.6333e-07; f =         -567.0645822
lambda = 8.5383e-07; f =         -705.4318732
Norm of dx      14328
----
Improvement on iteration 1 =      421.329349745
back and forth on step length never finished
-----------------
-----------------
f at the beginning of new iteration,      -988.3939319404
Predicted improvement:    19510.472534793
lambda =          1; f =         -986.6782540
lambda =    0.33333; f =         -988.2038025
lambda =    0.11111; f =         -988.3729731
lambda =   0.037037; f =         -988.3916582
lambda =   0.012346; f =         -988.3936971
lambda =  0.0041152; f =         -988.3939112
lambda =  0.0013717; f =         -988.3939309
lambda = 0.00045725; f =         -991.6241260
Norm of dx     1.9754
----
Improvement on iteration 2 =        3.230194084

…

Improvement on iteration 31 =        0.000000247
back and forth on step length never finished
-----------------
-----------------
f at the beginning of new iteration,     -1023.4940295049
Predicted improvement:        0.000010945
lambda =          1; f =        -1023.4940195
lambda =    0.33333; f =        -1023.4940286
lambda =    0.11111; f =        -1023.4940295
lambda =   0.037037; f =        -1023.4940295
Norm of dx 8.0884e-05
----
Improvement on iteration 32 =        0.000000022
improvement < crit termination
Objective function at mode: -1023.494030
 
POSTERIOR KERNEL OPTIMIZATION PROBLEM!
 (minus) the hessian matrix at the "mode" is not positive definite!
=> posterior variance of the estimated parameters are not positive.
You should  try  to change the initial values of the parameters using
the estimated_params_init block, or use another optimization routine.
Warning: The results below are most likely wrong! 
> In dynare_estimation_1 at 480
  In dynare_estimation at 70
  In mode_identification_interaction at 227
  In dynare at 120 
 
MODE CHECK
 
Fval obtained by the minimization routine: -1023.494030
 
 
RESULTS FROM POSTERIOR MAXIMIZATION
parameters
        prior mean     mode    s.d. t-stat prior pstdev

rho_einG_a   0.750   0.7958  0.0718 11.0853 beta  0.1000
rho_einG_b   0.750   0.8473  0.0635 13.3491 beta  0.1000
rho_etauw_a   0.750   0.9478  0.0199 47.6245 beta  0.1000
rho_etauw_b   0.750   0.9017  0.0367 24.5749 beta  0.1000
rho_eg_a    0.750   0.7939  0.0722 10.9989 beta  0.1000
rho_eg_b    0.750   0.9140  0.0391 23.3698 beta  0.1000
standard deviation of shocks
        prior mean     mode    s.d. t-stat prior pstdev

nua_einG    0.010   0.0019  0.0000  0.0000 invg  2.0000
nub_einG    0.010   0.0017  0.0000  0.0000 invg  2.0000
nua_etauw   0.010   0.0017  0.0000  0.0000 invg  2.0000
nub_etauw   0.010   0.0013  0.0000  0.0000 invg  2.0000
nua_ecG     0.010   0.0020  0.0000  0.0000 invg  2.0000
nub_ecG     0.010   0.0013  0.0000  0.0000 invg  2.0000
 
Log data density [Laplace approximation] is 965.284555.
 
Total computing time : 0h00m12s

Could you provide me with the code for this example. I have to look close into this. You might be up to something.

Yes, of course! Here you are:
data_mode_identification_interaction.xls (24.5 KB)
mode_identification_interaction_05_11.mod (2.54 KB)

The reason is that identification turns on

analytic_derivation

I understand. Thank you very much, jpfeifer! As always, you are a great help to me!
Best
Niklas

Dear professor jpfeifer, I have the same problem. The code run well without the “identification” commend. However, hessian matrix at the “mode” is not positive definite after adding the “identification” commend. I want to see your reply here and find the answer but the reply can not been seen any more. I hope you can help me with this problem, thank you so much!

I fixed the post.