Dear Peter,
From the Bayes theorem, we know that the posterior density is equal to the likelihood times the prior density divided by the marginal density of the data:
p(\theta|\mathcal Y_T) = \frac{p(\mathcal Y_T| \theta)p(\theta)}{p(\mathcal Y_T)}
The numerator is the posterior kernel (it’s not a density, because it does not integrate to 1). If you are only concerned by the inference about the parameters (\theta), you do not need to worry about the denominator, all the information about the parameters is embodied in the posterior kernel. The mode_check
option will return plots of the log of the posterior kernel and of the log likelihood as a function of one parameter, keeping the other parameters constant (equal to the estimated posterior mode). So if your model has two parameters \theta_1 and \theta_2, and if your estimation of the posterior mode is (\hat \theta_1, \hat\theta_2), you will have the plots for:
f(\theta_1)\equiv\log p(\mathcal Y_T | \theta_1, \hat\theta_2) + \log p(\theta_1, \hat\theta_2) \quad\text{and}\quad g(\theta_1)\equiv\log p(\mathcal Y_T | \theta_1, \hat\theta_2)
with \theta_1 taking values around \hat\theta_1, and
f(\theta_2)\equiv\log p(\mathcal Y_T |\hat \theta_1, \theta_2) + \log p(\hat\theta_1, \theta_2) \quad\text{and}\quad g(\theta_2)\equiv\log p(\mathcal Y_T | \hat\theta_1, \theta_2)
with \theta_2 taking values around \hat\theta_2. So these curves are not the full multi-dimensional log likelihood and posterior kernel, but only two-dimensional cuts through these objects. Note also that because the prior density is the difference between the two objects, the log likelihood is typically smaller equal than the posterior kernel (if the log prior density is positive), i.e. the graph of the likelihood would be below the posterior density and potentially not in the picture. For that reason, the likelihood is shifted upwards to have the same maximum as the posterior kernel in order to better compare them.
There is absolutely no reason to seek for an estimation where these cuts through the objects are identical. Indeed, if the prior brings some information to the inference about the parameters, the curves have to be different, and in general the mode of the posterior kernel (or density) does not match the mode of the likelihood.
The last statement about the importance of finding the posterior mode is wrong. The MCMC will converge even if it starts from an initial state different from the (global) posterior mode, as long as for the initial state the posterior density is strictly positive, the jumping proposal density covariance is positive definite, and that all the usual assumptions on the posterior density are satisfied. You will typically need a lot more iterations if the initial state is in a low posterior density region, but the MCMC will eventually converge to the posterior distribution. Actually, as long as mh_nblocks
is greater than one (the default being two), Dynare does not start the MCMC from the estimated posterior mode. Rather, the initial state of each chain is chosen randomly to be overdispersed around the estimated posterior mode (the distance to the estimated posterior mode is controlled by option mh_init_scale
).
The form of the proposal (or jumping) distribution matters more than the initial state of the chain(s) for determining the number of iterations needed to ensure convergence. It is essentially in this respect that the estimation of the posterior mode is important, because we use the inverse Hessian at the estimated mode as an approximation of the posterior covariance matrix.
Best,
Stéphane.