Data in decimal or percent form and estimation results

jgomezpi · December 27, 2023, 3:40pm

Hello Johannes,

Hope all is well.

We are trying to estimate a parameter in a univariate local linear trend model for US GDP in annual frequency starting in 1979.

If we use the data in approximate percent form, that is, the first difference of real GDP is defined as y_dif(t) = y(t) – y(t-1), where y(t) = 100 * ln( Y), then we get (this is in line 159 of the log file “Model1_data_in_percent_form.log”) that the prior mode is 0.6729, and so we wonder what we are missing since we were expecting the prior mode to be 0.5 because that is the prior mean and the distribution is symmetric.

If we instead use the data in decimal form, that is, without the 100, or as the first difference of real GDP defined as y_dif(t) = y(t) – y(t-1), where y(t) = ln( Y), then we get (as in line 213 of the log file “Model1_data_in_percent_form.log”) that the prior mode is 0.9900 (again different from 0.5) and that the log data density is NaN so the parameter cannot be estimated.

We have also run other versions of the model that can estimate a parameter when data is used in both, percent and decimal form. In these cases the reported prior and the estimated posterior depend on the form the data is used. We have also run other versions of the model that do estimate the parameter when the data is in decimal form but do not estimate it when the data is in percent form.

Please what is our blind spot, why is the reported mode different from the mean? Why is it that the model may run with the data in one way and not in the other? Why is it that the estimated posterior depends on the form the data is used?
We are attaching the model, the executable code, the data and to log files with the results.

Best regards and happy holidays,

Javier

Run_Model1.m (362 Bytes)
Model1.mod (1.1 KB)
Data_in_decimal_form.xls (29.5 KB)
Data_in_percent_form.xls (29.5 KB)
Model1_data_in_decimal_form.log (9.6 KB)
Model1_data_in_percent_from.log (8.4 KB)

agutieda · December 28, 2023, 9:25pm

Why is the reported mode different from the mean?
The lines of the .log files you mention report the posterior mode, not the prior mode, so they should differ from the prior mean if the data is informative about the parameter.

Why is it that the model may run with the data in one way and not in the other?
You’re imposing upper and lower bounds to the values of “alphaa” on line 43 of Model1.mod so that it must be between 0 and 0.99. When using the data in decimal form, the maximization routine stops exactly at the upper bound. The approximation of log-data density is NaN because it relies on derivatives of the posterior and, at this value of the parameter space, it is not differentiable. You can still sample the posterior using mode_compute=6.

Why is it that the estimated posterior depends on the form the data is used?
Estimates should change when the data changes. Estimated parameters in some models may be invariant to certain data transformations. But it’s not obvious why this should be the case for “alphaa,” considering the model has five other parameters whose value remains fixed.

jpfeifer · December 31, 2023, 8:44pm

You are talking about the posterior, which is influenced by the model fit, i.e. the likelihood. Here, the data scaling clearly matters. You fixed the shock standard deviations when scaling the data by 100. It matters whether your data fluctuates by 1 given standard deviations of around 1 as opposed to fluctuating by 0.01 when the standard deviation is about 1.

jgomezpi · January 5, 2024, 9:57pm

Yes, lines 159 and 213 of the log files refer to the posterior mode, thank you!
Yes, the issue was that I was multiplying the data by some factor but not the standard errors, thank you very much!