Model Comparison Bayesian Estimation (again)

Hello,
I read the relevant forum topics on this already, still not 100% sure I am doing it right, so some short questions are remaining:

I am basically comparing one model in 2 different versions, the second version differs only in one equation of the model block. I am using the same data for both models. I am using the same priors for the two models (althought the second model does contain additional parameters that didnt have to be estimated in the first - (1) does that bias the posterior odds ratio/model comparison decision somehow?)

Based on the formular of the posterior odds ratio, since I am using same data and priors, it is enough to compare marginal densities, right? (2)
**(3) **From th estimation results: “Log data density” was computed using ModifiedHarmonicMean and “Log data density [Laplace approximation]” via Laplace, correct?

The decision rule for model comparison using marginal densities is: (4) higher = better, correct? I.e. model A: -950, model B: -1000 ==> choose model A.
Or do I need to make any transformation on these values first (like taking absolute values, …) ? (5)

Thank you for your help !!

Basically all your answers are in Koop’s 2003 textbook “Bayesian Econometrics” on pages 4-5.

  1. For Bayesian model comparison models do not need to be nested and there is a natural degrees of freedom correction. Hence, as long as you use the same data having different parameters does not matter at all.
  2. Short answer: that is sufficient. Longer answer: the prior over the parameters does not matter but the prior odds ratio over the models (see Koop (2003), p. 4). If you a priori assign equal probability to all models (0.5 for your two models), you an simply compare the marginal data densities.
  3. Exactly.
  4. You want the marginal data density to be a high as possible. The logarithm is a monotonic transformation, so you also want the log marginal densities to be as high as possible.
  5. No, you don’t. However, it is often easier to compare models using posterior model probabilities, see Koop (2003), p. 4.

A quick follow-up question regarding (1). If in addition to additional parameters, the second model also has additional shocks, then with common priors across models can the log data density still be used to compare across models?

thanks

Additional shocks do not matter. What matters is that you estimate those shocks’ standard deviation. But this is just another parameter. Note also: you do not need the same priors as you seem to suggest.

thanks a lot.

Hi together,

just one more question in the same context: If I estimate a model M1 and get

Log data density [Laplace approximation] is -480, then I estimate a different version M2 (change priors but same data) and get
Log data density [Laplace approximation] is -490

then M1 is preferred by the data by a Bayes factor of exp(10), correct?

Best, Peter

Exactly.

Dear all,

I resuscitate this topic because I have a question about model comparison.

I am estimating a dsge model with financial frictions over different frequency bands ( Sala 2015 JAE is the main reference).
The model is the same over all the frequency bands and it has the same priors.
Also the observables variables are the same except for the facts that they are filtered at different frequency bands.

My question is:

Is it still possible to rank the models using the Bayes factor or the fact that I am using different frequency bands compromise this possibility?

Thanks

Federico

That is hard to say, actually. My hunch is that the answer is no. Please take the following with a grain of salt, because this is not my home turf.
A crucial factor is whether the likelihood and the prior are proper, i.e. integrate to one. Given the same observables, selecting different frequency bands seems to set some parts of equation (7) in Sala (2015) to 0, meaning the density will not integrate to 1. For searching the mode we only need proportionality, so this is fine. But it is not OK for model comparison. The problem arising here seems similar to the case of implicit prior truncation. What do you think?

My previous post was not so clear, my fault.

  1. Regarding your answer
    I think that you can be right about the density that will not integrate to 1 in Sala’s context ( It can explain why the author did not perform any model comparison through the BF even if it seemed to me a very natural exercise to do). I have to be honest, I didn’t think about the problem in this way. A proper answer deserves further investigation.

  2. Regarding my problem
    Actually, I am not trying to replicate Sala’s work in the sense that I am not bringing the likelihood function in the frequency domain and cutting frequencies according to parameter omega.
    Using different kind of band-pass Filter (so everything is in the time domain), I am trying to isolate High , low and BC frequecies and estimate my dsge model over it.

My question is:

Is it fair to compare through the BF the estimation results across different frequencies? My feeling is that the answer is NO.
In the standard model comparison

BF = p (y | M_1) / p ( y| M_2)

In my context

BF = p (y_8_16 | M_1) / p ( y_16_32| M_2)

where y_8_16 is the vector containing the frequencies between 8 - 16 quarters and y_16_32 the vector containing the frequencies between 16 - 32 quarters.
Even if the vector of observable variables is the same the way in which they are filtered is different and I can’t do the model comparison through the BF.

Am I correct ?

Thank you in advance for your reply

What matters here is which data enters your model. And that data is not the same across models, because you use the same series, but filtered differently. It is not like in Ferroni’s work where you use a model-endogenous filter, where the same data is used. As you vary the data, you cannot compare the marginal data densities

Thank you,
I suspected that

PS:
When you cite Ferroni’s work are you referring to
"Trend Agnostic One-Step Estimation of DSGE Models" ?

Yes, I was referring to that paper.

Dear Johannes,

I just came across a paper by Bhattarai, Lee and Park “Policy regimes, policy shifts, and U.S. business cycles” in the ReStat. They write in footnote 24:
“Moreover, note that although we estimate the model conditional on one policy regime at a time, it is possible to construct an unconditional posterior distribution of the parameters across all three policy regimes. This requires specifying a prior distribution over the policy regimes and then sampling from the posterior distribution of the parameters conditional on each policy regime, according to the posterior distribution over the policy regimes.”

I am not sure that I understand their procedure fully, especially how they computed the marginal likelihoods in table 2. But I also do not understand it wrt to your above answer. Different policy regimes are just different priors, that is at least what I think they do, using different priors. Could you maybe put your remark in context or give a brief idea why Bhattarai don’t simply compare the log data density?

Thanks so much!

The problem typically is that the way the marginal data density is computed, you need the prior to integrate to 1. But often there is implicit prior truncation due to e.g. the Blanchard-Kahn conditions not being satisfied. Estimating different regimes is a case like that, because you a priori impose a 0 prior density to parameter draws falling into other regimes when you reject these draws (instead of using the non-zero prior density given by your explicitly specified prior). When you want to compute the marginal data densities for the particular regimes, you therefore need to keep track of the prior mass in each regime and adjust for it not being equal to 1.

Thanks!

Sorry to ask again, I thought it was clear yesterday: Isn’t that always the case? For instance in every NKM we have some sort of Taylor principle and usually some indeterminacy cut-off, say above 1. So normal model comparison would also not work?

I think I still do not fully understand the specific nature of having different regimes. The following is from a paper that cite Sims (2003) “Probability models for monetary policy decisions” and the training sample method:
“This poses a challenge to our analysis because, as in Lubik and Schorfheide (2004), we determine which region is favoured by the data based on the posterior probability of each region. This statistic is influenced by the prior distributuion”

So again, it is probably something with the regions but I still cannot make up my mind. Any helpful suggestions are appreciated!

Thanks again

Yes, that is always the case, but most of the time, we do not do model comparison. Also, for easy models we can use a prior distribution that excludes the indeterminacy region (like a uniform distribution from 1 to 5 for the inflation feedback coefficient, where indeterminacy happens for values below 1).

If you read the discussion to An/Schorfheide (2006), the discussants are worried about the implicit prior truncation and An/Schorfheide respond that their implicit prior only truncates 3% of the mass and argue that this does not affect their conclusions.

The problem is not having different regimes, but rather having a prior distribution that includes two regimes, but then estimating the marginal data density for the regimes separately. Because in that case, you try to compare to “models” that now have an implicit prior that does not integrate to 1 (because the mass is split across the two regimes).

I think now I got it, thanks so much! Also for the reference!

Best,
Peter