Partial means test - tapering

Romero · 17 February 2020 17:22

Dear all,
I have a question regarding the Geweke partial means test. Specifically, I am trying to understand what the tapering option refers to exactly. If I understand correctly, it refers to the calculation of the numerical standard errors, which are calculated following Newey-West. I had a look at the Geweke (2005) book (see the attached pdf). Is it correct that a given taper step (say 15) corresponds to a specific value of L in theorem 4.7.3 (p. 149) ? If yes, what is the relationship?
Many thanks,
Ansgar

Contemporary_Bayesian_Econometrics_and_S.pdf (2.8 MB)

jpfeifer · 17 February 2020 21:47

Yes, it is about correcting the standard error of the mean for the presence of autocorrelation. The higher the taper, the more strongly you correct.

Romero · 18 February 2020 08:33

Thank you! So a larger taper percentage corresponds to larger L in equation 4.46? I was wondering if you could tell me the exact relation ship between the taper percentage dynare reports and L… I am sorry had a look at the code but I wasn’t able to get if from there.
What I understand from equation 4.46 in Geweke is that if L=2 or larger, the estimator of the population variance is no longer the sample variance, but instead the sample variance plus a weighted average of the autocovariances up to lag L-1. How does this relate to say the “15% taper” I can choose in Dynare? It’s a pecentage of what…?
Many thanks,
Ansgar

Romero · 19 February 2020 09:13

Or is the relationship between L and the tapering option not straightforward…?

jpfeifer · 19 February 2020 12:02

The L is equal to the taper. If you use 15%, then L=15. The reason it is 15% is that the number of covariances used, j=100.

Romero · 19 February 2020 13:29

Ah, I think I am beginning to see, thank you so much. I am not sure I understand your final sentence

Is it that the maximum number of autocovariances that can be used for tapering in Dynare equals 100?

Romero · 20 February 2020 09:12

I also wonder how to best set L. Should one look at a correlogram of the draws to see over which horizon the autocorrelations die out? I guess it is not possible to conclude from the inefficiency factors what the “correct” L is (in my case between 300 and 600)? Or is it in practice simply the safest to choose 15% (which I think you recommended in one post on the issue).

jpfeifer · 21 February 2020 20:51

Yes, that is what I typically do. I make sure that the 15% version cannot reject the null hypothesis at the 5% level and inspect the trace plots as well.

Romero · 5 March 2020 08:38

But could it be that this test is extremely sensitive…? I have now run the same model/ same data once with one long chain (1,500,000 draws, with 500,000 burn in) and once with three chains/ 500,000 draws. With the long chain, I rejected the null at the 5% for 17 out of 68 parameters. For the three chains, using the Brooks and Gellmann measure, I could see only 3 parameters where there was clearly no convergence after say 300000 draws.

jpfeifer · 5 March 2020 11:31

Could you please provide a trace plot for the non-converged parameters in the long chain? I would be curious.

Romero · 5 March 2020 13:11

Hi Johannes,
thank you for your interest! I am attaching traceplots for the following parameters. The number following the parameter is the p-value of the partial means test with 15% Taper:

crhob 0.696

csigl 0.030
SE_em 0.037

crhoqs 0.007
csadjcost 0.000

When I look at the traceplots, only for crhoqs and csadjcost I see that something is wrong. By contrast, the plots for cisgl and SE_em to me look very similar to the plot of crhob (where the null was not rejected). This why I thought that maybe the ttraceplots.pdf (287.0 KB) est is very sensitive…?

Many thanks,
Ansgar

jpfeifer · 5 March 2020 13:24

If you look closely and know what your are looking for, you can see the reason. Your posterior distribution is bimodal as is visible from crhogs. The long chain actually spends some time there around iteration 6e5 and then switches back. That is the point where csigl and SE_em also take on their minimum values. For them, the move is a bit more subtle, but it is clearly there. This switch to the second mode makes the test indicate a failure to converge.

Romero · 5 March 2020 16:42

Ah I see, thank you! If I wanted to try to get rid of the second mode by tightening some of the priors, would it still make sense to start with the parameters where the p-value is the lowest, i.e. casadjcost and/ or crhoqs, to see whether that gets rid of the convergence rejection for some of the other parameters as well?

jpfeifer · 6 March 2020 08:24

The fundamental question is whether you want to get rid of the second mode or whether you consider it part of your results. Given your trace plot, I would not be really worried about convergence that much. But if you want to get rid of it, yes, adding more informative priors may be one way to go. And yes, I would start with the parameters that jump the most. Note however that this will in principle invalidate your estimation. You are adjusting your priors based on what you saw from the data.

Romero · 6 March 2020 08:53

Oh I see. I was pretty worried based on the number of rejections… Actually I am mainly interested in the marginal data density, since I am comparing different models. Is the second mode a problem for the Geweke estimator of marginal data density…? Actually the La Place approximation and the Geweke value are pretty close (only 3 log points apart…).

jpfeifer · 6 March 2020 09:18

That may be a problem. The Laplace approximation around one mode is clearly inappropriate when there are multiple modes. The Geweke one should perform better in this case as it uses all the draws.