Bimodal Posteriors

joepearlman1 · December 3, 2013, 1:01pm

I wanted to test whether Dynare could estimate a non-invertible MA process, in particular y=eps-theta*eps(-1), where theta>1. I generated this process, and then estimated the parameters using the modfile below. The process was generated for 1000 periods using theta=0.5 and var(eps)=1.

First of all, the result is that it did estimate the non-invertible process (although using different priors I could get Dynare to estimate the equivalent invertible process).

However, what I would have anticipated is that since I used a flat prior (the uniform distribution) for the standard error, that the posterior would have been bimodal, with approximately equal likelihoods at theta and 1/theta. Instead the posterior was almost vertical at theta.

I have recently been wondering why it is that I so rarely see bimodal or indeed multimodal priors, and this adds to the mystery.

MODFILE:

//Simple MA(1) model

var y x;
varexo eps;

parameters rho;
rho = 0.90;

model;
x = eps;
y = -rho*x(-1)+eps;
end;

shocks;
var eps; stderr 1;
end;

steady;
check;

estimated_params;

//stderr eps, INV_GAMMA_PDF,1,200; //technology
stderr eps, UNIFORM_PDF,0.001,5; //technology
//rho, BETA_PDF, 0.4,0.2; //AR1 technology
rho, NORMAL_PDF, 1.25,2; //AR1 technology

end;

//stoch_simul(irf=10) x y;
varobs y;

estimation(datafile=y,mode_compute=4,nobs=1000,
prefilter=0,mh_replic=5000,mh_nblocks=2,mh_jscale=0.40,mh_drop=0.2);

jpfeifer · December 4, 2013, 10:08am

There are results that the posterior of linear Gaussian processes from is asymptotically normal (see e.g. Chib/Greenberg 1995).
Truly bimodal posterior distributions pose a separate problem as the two modes are often relatively disjoint. The standard MCMC has a hard time traversing the whole posterior in finite time. It can be done, but takes a long time. The problem typically is that people use the MCMC with the Hessian at one of the modes as the proposal. While this allows for an efficient evaluation around this mode, it makes an efficient evaluation of the second mode even more unlikely.

I attached a run of 100000 draws with acceptance rate 2% due to a wide identity matrix proposal density instead of the default inverse Hessian at the mode (requires the recent Dynare unstable snapshot or 4.4 later on). Many draws are rejected, but now the MCMC is able to jump to the second mode. The second mode is clearly visible. More draws will make it even more pronounced.

Sidenote: Mode-Jumping MCMCs work better if you know there are multiple modes you are interested in.
MA.mod (469 Bytes)
MA_PriorsAndPosteriors1.pdf (12.1 KB)

joepearlman1 · December 5, 2013, 8:41am

Thanks for checking this. I’m pleased to see the two modes, but this raises two further questions:

What is the point of using MCMC if the full parameter space is not accessed? I have always been led to understand that MCMC is particularly useful because the first stage might only get you to a local maximum, but MCMC can point towards the global maximum.
If the structure of the system is such that the reduced form representation of the model contains MA terms, then there may be multiple local modes because of the observationally equivalent representation of MA processes as invertible and non-invertible. Should the search not be constrained to ignore non-invertible MA processes?

stepan-a · December 5, 2013, 10:51am

Hi,

Yes, but the choice of the proposal distribution is important to ensure that the MCMC will visit all the parameter space and converge to its ergodic distribution in a a reasonable (finite) number of iterations. Also if the posterior density attains zero between the two modes the MCMC will get stuck in one of the regions (unless the proposal distribution allows big jumps).

[quote=“joepearlman1”]
2. If the structure of the system is such that the reduced form representation of the model contains MA terms, then there may be multiple local modes because of the observationally equivalent representation of MA processes as invertible and non-invertible. Should the search not be constrained to ignore non-invertible MA processes?[/quote]

Not necessarily. The reduced form of a DSGE model is a VARMA model with nonlinear restrictions between the reduced form parameters. These restrictions often help to fix the identification issues (that is why it is far easier to estimate a DSGE model than an ad-hoc state space or VARMA model). We may check if these restrictions solve the invertibility issue in a simple RBC model (whose reduced form for output is an arma(2,1))…

Best,
Stéphane.

ste_s · January 18, 2014, 12:01pm

Dear all,
I have a problem on a related example. I am estimating an MA(2) imposing the following restriction between the two MA parameters:

          theta_1+theta_2=1.5

I repeated the exercise generating data (samples of 160 observations) from two different DGPs.

nonfundamental DGP: theta_1=.3, theta_2=1.2
fundamental DGP: theta_1=.7, theta_2=.8

In both cases any alternative MA representation (i.e. any root flipping solution) is ruled out because it violates the restriction, so that Dynare should always end up finding the true DGP. Given that there is only one free parameter, either theta_1 or theta_2 could be estimated and the remaining parameter is found by imposing the restriction. So for the DGP 1) I should get a bimodal posterior with modes at .3 and 1.2, while in the second case I should get a bimodal density with modes at .7 and .8.
In the case 1) I got the correct result (but the two peaks of the density are at equal height only when the number of draws is as high as mh_replic=500000).
The case 2) is puzzling and I repeated the exercise for different samples. In most cases I got a unimodal posterior density with a mode lying in the middle (i.e. approx .75) between the true modes. In a few samples I got a weird posterior with modes at zero and 1.5.

Thank you very much for your help.

Best,

Stefano
Demo_ma2_F.mod (677 Bytes)

jpfeifer · January 23, 2014, 9:33am

Given your small numbers of time periods in your sample, the closeness of the two modes, and a small number of MCMC draws, I would be suprised if you would get two clearly separated modes in case 2.

ZBCPA · April 2, 2017, 11:00am

Dear Johannes,

If model A and model B have same observable data when estimated, but model B has bimodal posteriors. Can I still compare log data density between A and B to see data favours which model? In other words,** bimodal posterior does not influence model comparison**?

Many thanks in advance,
Huan

jpfeifer · April 3, 2017, 12:36pm

No, bimodality does not affect model comparison per se, provided the marginal data density is computed correctly. That condition, however, is tricky. You cannot use the Laplace approximation, because it assumes a normal distribution and thus cannot handly bimodality. The Geweke modified harmonic mean should be fine, but only if the MCMC correctly samples from both modes