Mode Check figure

peter · August 6, 2017, 10:18am

Hallo everyone

I have a question related to mode_check plot
I read a paper, titled: An Introduction to Graphs in Dynare
at https://bbcb79fe-a-62cb3a1a-s-sites.googlegroups.com/site/pfeiferecon/Pfeifer_2014_Dynare_graphs.pdf?attachauth=ANoY7cqBNQ4fbMKM2VrqXQNA35zxsC-MhaGMhTaXB8jc2_MfOOdsqsqUxQ61oCJh9k9xGn2I4AOdfLk_gut2I76rnSOD3ltuvcRrxA1bE7pzOlN0sfbx-SWdnRQY9JZHYSFIelr7MvwQamN-7wuP1LagP8zvEE1GFQ9U_C0bc3OP98ZxtDeRccI3TQ38dnG_YpoYIYdk0Vyt-DUUboxjMdVh8JdMw3FwU4r13cJwn1lEBbwr3PbvAQE%3D&attredirects=0
Especially, page 12,
There are two lines, blue line is the posterior likelihood function, green line is posterior mode.

In my attached pdf file, it also has a subtitle introducing two lines: blue line is log-post (posterior likelihood function) and green line is log-lik kernel (posterior mode)
However, in the figures, there is only blue line but not green line. In stead of that, there a yellow line appears in all in the figures
So could anyone can explain what is the yellow line in my figure? and where is green line in my figure?untitled.pdf (10.6 KB)

jpfeifer · August 6, 2017, 10:38am

It seems your are using an old version of Dynare. Matlab changed its graphics engine at some point, making the legend and the plot colors inconsistent. The red line should actually be green. Please upgrade to Dynare 4.5. There, everything should work as expected.

peter · August 6, 2017, 7:51pm

Dear Prof. @jpfeifer

It works now, thank you so much for that
I would like to ask you to check my understanding

Based on bayesian method, we have the formula

Log(posterior density) = log(likelihood) + log(prior density)
Combining with Dynare, then
log-post (blue line) is Log(posterior density)
and log-lik kernel (Green line) is log(likelihood)
Is this correct?

Related to finding the mode before lauching the MH algorithm
if the mode_check plot show that Mode of log-post and log-lik kernel are identical, then, we can consider this mode as the mean of proposal density for the MH algorithm. Otherwise, we have to find the mode again as long as Mode of log-post and log-lik kernel are identical
Is that correct?
If we can not find the truly global mode then the MH algorithm can not converge. If such, do you have any suggestion to find the truly global mode (based on your own knowlegde and experience)

Thank you so much for that

With best regards

stepan-a · August 6, 2017, 9:52pm

Dear Peter,

From the Bayes theorem, we know that the posterior density is equal to the likelihood times the prior density divided by the marginal density of the data:

p(\theta|\mathcal Y_T) = \frac{p(\mathcal Y_T| \theta)p(\theta)}{p(\mathcal Y_T)}

The numerator is the posterior kernel (it’s not a density, because it does not integrate to 1). If you are only concerned by the inference about the parameters (\theta), you do not need to worry about the denominator, all the information about the parameters is embodied in the posterior kernel. The mode_check option will return plots of the log of the posterior kernel and of the log likelihood as a function of one parameter, keeping the other parameters constant (equal to the estimated posterior mode). So if your model has two parameters \theta_1 and \theta_2, and if your estimation of the posterior mode is (\hat \theta_1, \hat\theta_2), you will have the plots for:

f(\theta_1)\equiv\log p(\mathcal Y_T | \theta_1, \hat\theta_2) + \log p(\theta_1, \hat\theta_2) \quad\text{and}\quad g(\theta_1)\equiv\log p(\mathcal Y_T | \theta_1, \hat\theta_2)

with \theta_1 taking values around \hat\theta_1, and

f(\theta_2)\equiv\log p(\mathcal Y_T |\hat \theta_1, \theta_2) + \log p(\hat\theta_1, \theta_2) \quad\text{and}\quad g(\theta_2)\equiv\log p(\mathcal Y_T | \hat\theta_1, \theta_2)

with \theta_2 taking values around \hat\theta_2. So these curves are not the full multi-dimensional log likelihood and posterior kernel, but only two-dimensional cuts through these objects. Note also that because the prior density is the difference between the two objects, the log likelihood is typically smaller equal than the posterior kernel (if the log prior density is positive), i.e. the graph of the likelihood would be below the posterior density and potentially not in the picture. For that reason, the likelihood is shifted upwards to have the same maximum as the posterior kernel in order to better compare them.

There is absolutely no reason to seek for an estimation where these cuts through the objects are identical. Indeed, if the prior brings some information to the inference about the parameters, the curves have to be different, and in general the mode of the posterior kernel (or density) does not match the mode of the likelihood.

The last statement about the importance of finding the posterior mode is wrong. The MCMC will converge even if it starts from an initial state different from the (global) posterior mode, as long as for the initial state the posterior density is strictly positive, the jumping proposal density covariance is positive definite, and that all the usual assumptions on the posterior density are satisfied. You will typically need a lot more iterations if the initial state is in a low posterior density region, but the MCMC will eventually converge to the posterior distribution. Actually, as long as mh_nblocks is greater than one (the default being two), Dynare does not start the MCMC from the estimated posterior mode. Rather, the initial state of each chain is chosen randomly to be overdispersed around the estimated posterior mode (the distance to the estimated posterior mode is controlled by option mh_init_scale).

The form of the proposal (or jumping) distribution matters more than the initial state of the chain(s) for determining the number of iterations needed to ensure convergence. It is essentially in this respect that the estimation of the posterior mode is important, because we use the inverse Hessian at the estimated mode as an approximation of the posterior covariance matrix.

Best,
Stéphane.

peter · August 7, 2017, 2:01pm

Dear @stepan-a

Thank you so much indeed for you very detailed explantion
So, in Dynare, the mode_check plot will display

f(01) is Blue line (log-post) and g(01) is log-lik kernel
Is this correct?

stepan-a · August 7, 2017, 2:11pm

Sorry, but I do not remember the colours… Isn’t the colour code appearing below in the legend?

Best,
Stéphane.

jpfeifer · August 7, 2017, 2:12pm

Yes, that is the case in the figure you posted above.

peter · August 7, 2017, 2:32pm

Dear @stepan-a

I am sorry for confusing you. I mean the color in the attached file untitled.pdf (10.6 KB)

peter · August 7, 2017, 2:32pm

Dear @jpfeifer

I got it. Thank you so much for that

peter · August 7, 2017, 2:57pm

Dear Prof. @stepan-a

Related to your last answer

I have read paper by An & Schorfheide (2007) (Bayesian Analysis of DSGE model), for random-walk metropolis Algorthm, he chose two jump paramters
mh_init_scale = 1 and mh_jscale = 0.3 (the rejection rate is about 45%)
So my question is that
is there any general rule to choose mh_init_scale and mh_jscale? in Dynare the default value for mh_init_scale = 2*mh_jscale, this jump factor should be adjusted, is this right?

jpfeifer · August 7, 2017, 3:03pm

mh_jscale should be adjusted to give an acceptance rate of 23%. For a multivariate normal posterior, this would be the most efficient choice to get quick convergence. For mh_init_scale there is less good guidance. It should be bigger than mh_jscale to be overdispersed, but typically not too big. 2*mh_jscale is usually a good compromise.

stepan-a · August 7, 2017, 3:31pm

The targeted 23% acceptance rate is at best a rule of thumb. To my knowledge, there is no proof that this would be optimal for a particular DSGE model. So I do not think that we should be obsessed by this target.

The choice for the second parameter, mh_init_scale does not affect the convergence of the MCMC. It is here to ensure that the initial conditions of the chains (if mh_nblocks>1) are different enough. This parameter only affects the convergence diagnostics.

Best,
Stéphane.

jpfeifer · August 7, 2017, 4:17pm

@stepan-a is right. The theoretical proof is for a multivariate normal distribution is with many parameters. DSGE models have a posterior that should be asymptotically normal. But often we are quite far away from the asymptotics kicking in. In that regard, it is more of a rule of thumb. But it works quite well in practice.Anything in the range of 20 to 30 percent should be ok.

peter · August 8, 2017, 1:04pm

Dear @stepan-a

I got it, thank you so much for that

With best regards

peter · August 8, 2017, 1:04pm

Dear @jpfeifer

Thank you so much for your explanation

Best regards