Comparing models by marginal likelihood

Dear all,

In Smets and Wouters (2007), they compare different model specifications using a pre-sample period as a training sample according to Sims’s comments in his 2003 paper. This is claimed to increase the comparability of the marginal likelihood of various models. In their Dynare code, the following commands are for the calculation of the marginal likelihood with training sample:

// Calculation the marginal likelihood with training period (40 observations between '56 and '65)
// estimation(optim=(‘MaxIter’,200),datafile=usmodel_data,mode_compute=0,mode_file=usmodel_hist_dsge_f19_7_31_mode ,first_obs=31,nobs=200,presample=4,lik_init=2,prefilter=0,mh_replic=0,mh_nblocks=2,mh_jscale=0.20,mh_drop=0.2);
// estimation(optim=(‘MaxIter’,200),datafile=usmodel_data,mode_compute=0,mode_file=usmodel_hist_dsge_f19_7_3144_mode,first_obs=31,nobs=44,presample=4,lik_init=2,prefilter=0,mh_replic=0,mh_nblocks=2,mh_jscale=0.20,mh_drop=0.2);

According to their documentation, mode_file=usmodel_hist_dsge_f19_7_31_mode is for the sample 1956:1-1965:4, then why nobs=200? ALso, mode_file=usmodel_hist_dsge_f19_7_3144_mode is for the sample 1956:1-2004:4, then why nobs=44?

My questions are:

  1. How to understand these commands?
  2. Does Dynare automatically take the training sample method into account? If not, how can this method be realized in Dynare?

Thanks a lot!

I will need more time to look at this next week. Please remind me at the end of next week, if you don’t hear anything be then.

OK. Thanks a lot, Dear Johannes!

There seems to be a typo in the Smets/Wouters (2007) readme.pdf. While they state:

usmodel_hist_dsge_f19_7_31_mode : mode and hessian 1956:1 - 1965:4 usmodel_hist_dsge_f19_7_3144_mode : mode and hessian 1956:1 - 2004:4
It should presumably be the other way round. The one suffixed with 44 should be the mode file for the 44 observations from 1956:1 - 1965:4. Given this, the rest makes sense.

computes the marginal data density for the 44 observations (nobs=44) following the first observation used, i.e. observation number 31 (first_obs=31). The dataset they use in their mat-file starts in 1947:3 (in contrast to the Excel-file that starts in 1947:1). Using first_obs=31 implies starting in 1955:1.


estimation(optim=('MaxIter',200),datafile=usmodel_data,mode_compute=0,mode_file=usmodel_hist_dsge_f19_7_31_mode ,first_obs=31,nobs=200,presample=4,lik_init=2,prefilter=0,mh_replic=0,mh_nblocks=2,mh_jscale=0.20,mh_drop=0.2);
computes the marginal data density for the 200 observations from 1955:1 to 2004:4.
Finally, the prefilter option uses the first 4 observations to initialize the Kalman filter.

Regarding the implementation: as shown in Sims (2003), equation 7, the posterior can be split up to have the training sample posterior as its prior. Now given that the data density can be factorized as

where T0 is the training sample, we can obtain the log data density for the sample T1 as

that is as the difference between the marginal data density from the full sample minus the one of the training sample. Using the numbers for the Laplace approximation of the marginal data density from the Smets/Wouters code, we get

which is the number reported at the bottom of their TABLE 2.

1 Like

Dear Johannes,

This is great! Thank you so much for the clarification!

Dear Johannes,
I hope you are well. I am currently trying to understand the training sample approach used by Smets and Wouters and discovered this thread, which was really helpful, the calculation you present is very intuitive. There is one issue, however, where I would be grateful for your clarification.
I understand that the marginal data density estimated by dynare which you use in you calculation is defined as in the attached file.
That means, the parameters theta are integrated out, so that we get the probability of the data given the model, independently of the parameter values.
But Sims seems to use the likelihood, rather than the marginal data density (p. 10), i.e. the parameter vector that is still there and not integrated out (which why the prior p(theta) is still there. What am I missing? Or is it simply that the decomposition he suggests for the likelihood can also be applied to the marginal density of the data?
Many thanks,

The training sample idea is a way to get a proper prior distribution. This is nothing that is specific to subsequent applications.

Marginal densities are only a natural application where this idea comes in handy. Recall that the MCMC algorithm only requires a kernel, not a proper density, because the constant term renormalizing the kernel will cancel. This is different when computing the marginal density. Here we need proper densities that integrate to one. Using the training sample idea to get a proper prior is one way to do this.

Hi Johannes,
many thanks, that is really helpful.

Dear All,

I’m quite new to Dynare and I also have a question in this regard. I would like to replicate a part of the Smets & Wouters (2007) model. I try to obtain out-of-sample forecasts with a rolling estimation window (for a number of years or quarters). I would like to take say 80 or 100 quarterly observations, estimate the model and obtain forecasts for the 7 observed variables. And then shift the estimation sample ahead.

Now, if I specify the “mode_file” in the estimation command, Dynare does not start MH algorithm but uses the previously calculated mode as the posterior mode (Am I right?). But I think this is not what I want. Instead, I specify mh_replic=100000 and Dynare starts the MH algorithm. So far, that works, albeit not in all cases.

As Smets & Wouters stress (referring to Sims, 2003) it is important to consider a training sample to specify the priors in order to make models comparable to each other. My question now is, how to consider the training sample for the priors in the Dynare code? Can I simply estimate the model with an initial training sample and take the posterior mode and standard deviation as my prior specification of the subsequent estimation windows? In the end, I would like to compare the forecast performance with a Bayesian VAR and univariate models (Random Walk, AR(1)).

Thank you very much in advance,