# Data transformation for estimation

Dear users,

I try to estimate a model that is written in log deviations from steady state. I want to estimate the model with European data, I use four observables - real GDP, Inflation, nominal interest rate and real house prices. I used time series from ECB and Eurostat and perform the following transformations:

1. GDP - log transform and detrend with one-sided HP filter.

2. Inflation - take first difference of logs of HICP and then demean them.

3. Interest rate - r_obs = log(1+r_data/400) - mean ( log(1+r_data/400) )

4. House Prices - deflate with HICP
Log transforme and one-sided HP-filter.

Please, correct me if I am wrong!

The transformations look OK. But you still need to make sure that you match the data correctly to the model. That is, when using first differences in the data, you need to specify an observation equations linking this to the growth rate of variables in the model. I guess you have already thought of this. If not, see Pfeifer(2013): “A Guide to Specifying Observation Equations for the Estimation of DSGE Models” sites.google.com/site/pfeiferecon/Pfeifer_2013_Observation_Equations.pdf for details.

Thank you!

Dear Johannes Pfeifer:

First of all my congratulations for your document “A Guide to Specifying Observation Equations for the Estimation of DSGE Models”. It is, indeed, a precious help.
Nevertheless I am afraid to have misunderstood the content of your Remark 26 (Demeaning growth rates of stationary variables).
Could you, please, confirm if the following observation equations are correct:

Pi_obs = Pi_tilde - Pi_tilde ; // matched to demeaned growth rate, inflation is a stationary variable

Pi_obs = Pi_tilde - Pi_tilde + mu_pi ; //matched to non-demeaned growth rate, inflation is a stationary variable and mu_pi corresponds to the inflation growth rate mean of our sample average.

if I understand, in linearized model meaning Xhat = log(X) - log(Xbar),
Xobs = Xhat

a second question: in case of annualized series, How can we convert them into quarterly?

[quote=“Oriana”]Dear Johannes Pfeifer:

First of all my congratulations for your document “A Guide to Specifying Observation Equations for the Estimation of DSGE Models”. It is, indeed, a precious help.
Nevertheless I am afraid to have misunderstood the content of your Remark 26 (Demeaning growth rates of stationary variables).
Could you, please, better explain what are the implications for our model if we have a data growth rates mean higher than zero and we do not demean them?[/quote]

[quote=“Oriana”]Dear Johannes Pfeifer:

First of all my congratulations for your document “A Guide to Specifying Observation Equations for the Estimation of DSGE Models”. It is, indeed, a precious help.
Nevertheless I am afraid to have misunderstood the content of your Remark 26 (Demeaning growth rates of stationary variables).
Could you, please, confirm if the following observation equation are correct:

Pi_obs = Pi_tilde - Pi_tilde ; // matched to demeaned growth rate, inflation is a stationary variable

Pi_obs = Pi_tilde - Pi_tilde ; //matched to non-demeaned growth rate, inflation is a stationary variable[/quote]

Sorry, but the piling up of your posts are confusing.
@Ravelomanana: that depends on the scaling of the variables. It should be detailed in Pfeifer(2013): “A Guide to Specifying Observation Equations for the Estimation of DSGE Models” sites.google.com/site/pfeiferecon/Pfeifer_2013_Observation_Equations.pdf. For the gross inflation rates we are talking about, you have to invert the geometric mean by using Pi^(1/4), which at first order corresponds to dividing the net inflation rate by 4.

@Oriana: What exactly is your most recent question?

cannot be correct as this would imply Pi_obs is always 0. The idea is that the model predicts an steady state growth rate of 0 for inflation. If now in the data the average sample growth rate of inflation is 0.5%, you will force some inflationary shock to account for this on average high inflation.

Yes, a bit confusing indeed. So, thanks for your reply.

In the expression Pi_obs = Pi_tilde - Pi_tilde(-1) I meant Pi_obs equal to the observed growth rate of inflation. I noticed, however, that my sample average growth rate of inflation is higher than zero. So, if I understand your advice is to not demean my sample average growth rate of inflation. Additionally, I am not also obliged to change the expression Pi_obs - Pi_obs(-1) = Pi_tilde - Pi_tilde(-1) in order to get a good match between observed and model variables, what it wouldn’t be the case if we were working with first differences for a non-stationary variable like y.
I forgot to mention that I am working with a nonlinear model for log-linearization.

I think someone also asked in case of annualized series, how can we convert them into quarterly?
I suggest to use Denton if you have Stata (stata.com/support/ssc-installation/
& fmwww.bc.edu/RePEc/bocode/d/denton.html)

As always, the answer is: it depends. Say you have a secular decline in inflation in the US from high levels in the 1970s to much lower ones in 1990s. This will show up as a sample mean in the average inflation growth rate well below 0. If you want your model to explain this secular decline, you should not demean the growth rate. If you think this is a trend outside of your model, you should take it out by demeaning the growth rate. There is no real right or wrong here as it is up to the choice of the model builder.

For annualized data DO NOT USE THE DENTON METHOD. INTERPOLATION IS WRONG!!! It does not preserve causality!
Regarding terminology: when you say annualized you need to distinguish data only available at annual frequency and quarterly data transform to annual values by effectively multiplying them by 4. In the first case, you need a mixed frequency observation equation while in the second case, you can simply invert the transformation.

Many thanks Prof. Johannes Pfeifer!

I really get what you meant about the demeaning variable inflation growth rate subject.

Nevertheless, I am quite disappointed about your advice to not use DENTON for interpolation. What you meant when you say …It does not preserve causality!
I already used Denton to transform many countries GDP data, only available at annual frequency, into quarterly frequency data. I used OECD aggregate GDP and NAFTA aggregate GDP as indicator and the results were more or less what I expected.

DO YOU ADVICED TO DO NEW INTERPOLATIONS AGAIN? WHAT SOFTWARE DO YOU SUGGEST?

Dear Johannes Pfeifer:

It seems quite amazing that you do not advice to use Denton Proportional method with Stata as significant literature says quite the opposite, e.g. " An Empirical Review of Methods for Temporal Distribution and Interpolation in the National Accounts", Chen & Andrews (2008).

Could you, please, give me a better explanation about your viewpoint and the best alternatives?

It has to do with causality. Denton is best practice for national accounts if there is missing data between annual points. You are correct about that.
But you are not compiling national accounts. You are trying to estimate a DSGE model. The DSGE model assumes that there are stochastic shocks occurring at time t that drive your variables at time t. Typically, your model can be represented as a finite order VAR in the observables. Now, there is a huge literature showing that unaccounted for time aggregation/interpolation in VARs leads to spurious inference. Think about it this way: with Denton, output in in the middle of the year, say June, will be some weighted average of output at the beginning of the year and the end of the year. But the end of the year is in the future! This future value will be influenced by e.g. shocks that occurred in December. But by definition, shocks that only occur in December can not influence output in June - unless there is time travel. But interpolation does exactly imply that. Thus, when using interpolation the causality of events is destroyed.

Put differently, your VAR model actually becomes a VARMA process. But you do not account for these MA components in your model setup when you use Denton without telling the model that future values were used to construct today’s value. That is a why mixed-frequency approach is the right way to go.

Dear Johannes:
I really got your viewpoint. I think MIDAS (meaning Mi(xed) Da(ta) S(ampling)) from unc.edu/~eghysels/ is a good choice.
Thank you!

MIDAS is a regression framework. In terms of DSGE modeling in Dynare, you would use a similar mixed-frequency approach of the type outline in Section “8.3 Mixed Frequency” of Pfeifer(2013): “A Guide to Specifying Observation Equations for the Estimation of DSGE Models” sites.google.com/site/pfeiferecon/Pfeifer_2013_Observation_Equations.pdf.

I see is rather more simple to implement than I sought!

A last favour, please, if possible!
There are some literature to help us to analyse historical and smoothed variables figures?
There must be something wrong with my delta_logoilprice_jp_obs (please find enclose figures from Historical and smoothed variables).

My model is a Log-linearization version. My relationships between Model variables and observed data are:

deltalog_gdp_jp_obs = c1logdeltagdp + (c1mugdpss - 1);// matched to non-demeaned growth rate, useful to estimation purposes
deltalog_gdp_for_obs = c2logdeltagdp + (c2mugdpss - 1);//matched to non-demeaned growth rate, useful to estimation purposes
dpcexfe_log_jp_obs = c1dcore ; //matched to log level, d1core is the net inflation
rer_log_jp_obs = c1rer ; //matched to log level, c1rer is the log exchange rate
dcomp_log_jp_obs = log(c1omega) ;
//c1omega is the model gross inflation and dcomp_log_jp_obs the observed net inflation

fedfunds_jp_obs = c1rs ;// matched to net interest rates, fedfunds_jp_obs is the net interest rate observation in quarterly form
fixi_share_jp_obs = ((exp(c1rpipd) * exp(c1i)) / exp(c1yd)) / c1rngdpy;
pce_share_jp_obs = ( ( (exp(c1rpcpd) * exp(c1c)) / exp(c1yd)) /c1rngdpy ) ;
exp_share_jp_obs = ( exp(c1x) / exp(c1yd) ) / c1rngdpy ;
imp_share_jp_obs = ( exp(c1m) / exp(c1yd) ) / c1rngdpy;

oilimp_share_jp_obs = exp(c1rpopd) * ( exp(c1o) / exp(c1y) - exp(c1yo) / exp(c1y) ) * (1 / c1rngdpy );
oilprod_jp_deltalog_obs = log(c1oy) - log(c1oy(-1)) + ( c1grmuoss - 1 );//matched to non-demeaned growth rate, useful to estimation purposes
oilprod_for_deltalog_obs = log(c2oy) - log(c2oy(-1)) + ( c2grmuoss - 1 );//matched to non-demeaned growth rate, useful to estimation purposes
delta_logoilprice_jp_obs = c1rpopd - c1rpopd(-1) + ( c1grmuzoss * c1mugdpss - c1grmuzss ) / c1grmuzss ; //matched to non-demeaned growth rate, useful to estimation purposes
fig.5.pdf (8.46 KB)
fig.4.pdf (12.1 KB)
fig.3.pdf (7.31 KB)
fig.2.pdf (12 KB)
fig.1.pdf (11.4 KB)

Forget, I already found what was the problem!