MLE Loss Function

mlam303 · 3 February 2022 21:40

When I carry out maximum likelihood estimation (like that shown in the attached .mod file), what process is being used? i.e. what algorithm is used (and what loss function is being minimised)?

toy_model_MLE_estimation.mod (1.6 KB)

jpfeifer · 4 February 2022 11:00

You use a numerical minimizer (usually some type of Newton algorithm) to minimize minus the likelihood function. That is the very definition of ML: maximize the likelihood.

mlam303 · 4 February 2022 15:26

Thanks Johannes.

I realise that minimising minus likelihood is the definition of MLE, but more specifically, am I correct in thinking we’re doing MLE in the context of an Errors-in-variables situation? And therefore the particular algorithm (and the loss function used) is different from the basic MLE algorithm commonly used in non-economic applications?

jpfeifer · 6 February 2022 16:27

What do you mean with “algorithm” in this context? The tricky part with DSGE models is dealing with unobserved states. This requires using a filter like the Kalman filter to construct the likelihood function.

mlam303 · 8 February 2022 17:56

Thanks Johannes.

I realise that Kalman filters etc are needed for estimation in models with some complexity, but I’m considering an extremely simple model, assumed to have no unobservable variables.

The full model is of the form:

y_t = -\lambda r_t + \theta a_t + \varepsilon_1 \\ \\ \pi_t = \pi_{t-1} + w y_t + \varepsilon_2 \\ \\ a_t = -\alpha r_t + \varepsilon_3 \\ \\ r_t = \phi_1 \pi_1 + \phi_2 a_t + \varepsilon_4

where \varepsilon_i are i.i.d. Gaussian shocks. This model therefore has matrix form:

\begin{gather} \begin{bmatrix} y_t \\ \pi_t \\ a_t \\ r_t \end{bmatrix} = \begin{bmatrix} 0 & 0 & \theta & -\lambda \\ w & 0 & 0 & 0 \\ 0 & 0 & 0 & -\alpha \\ 0 & \phi_1 & \phi_2 & 0 \end{bmatrix} \begin{bmatrix} y_t \\ \pi_t \\ a_t \\ r_t \end{bmatrix} + \begin{bmatrix} 0 & 0 & 0 & 0 \\ 1 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \end{bmatrix} \begin{bmatrix} y_{t-1} \\ \pi_{t-1} \\ a_{t-1} \\ r_{t-1} \end{bmatrix} + \begin{bmatrix} \varepsilon_1 \\ \varepsilon_2 \\ \varepsilon_3 \\ \varepsilon_4 \end{bmatrix} \end{gather}

i.e.

Y_t = A Y_t + B Y_{t-1} + E

For estimating this model, assume we have a 4 \times N matrix of time series data for y_t, \pi_t, a_t, r_t (call it D).

My question is, how do I fit this model using maximum likelihood estimation? (without using Kalman filters or Bayesian methods)

My understanding is that this is an “Errors-in-variables” problem, which in some cases requires Total Least Squares (TLS). However, the TLS methods I’ve seen apply to more simple scaler y=\beta x + \gamma problems, rather than VARs (or structural VARs).

jpfeifer · 8 February 2022 18:45

In that case, you can solve the model for any given parameter set to obtain the matrices A and B. Once you have those and you observe the entries of Y_t, you can use your equation to compute E. Knowing that E is multivariate Gaussian allows you to compute the density of E. ML involves finding the parameters to maximize the product of these density/ the sum of the log-densities.

mlam303 · 8 February 2022 19:51

Great, thanks.

In practice, is there a particular loss function to be minimised, that corresponds to finding the maximum likelihood?

(An analogy would be that we minimise a “sum of square errors” loss function to get ML estimates in simple regressions. Presumably the loss function is different here?)

jpfeifer · 8 February 2022 20:12

Have a look at Chapter_1_VAR.pdf - Google Drive
We are essentially talking about slides 21 to 26