Computing the Maximum Likelihood Estimators


From a linear state-space perspective,

y_t=M_1(\theta) s_t+M_0(\theta)+\epsilon_t

s_t=N_1(\theta) s_{t-1}+u_t

To compute the likelihood,when the error terms are Gaussian, we use a Kalman filter.

In a frequentist view, we usually estimate the ‘structural’ parameters using maximum likelihood estimators(MLE).

However, in the programs like Sims(2002), to solve the transition equation above, one needs to introduce some numerical parameter values as input, and then the program will compute a numerical solution for the transition equation. This doesn’t seem to allow the use of a numerical optimisation routine to find the MLE, since to maximise the likelihood we would need a function not yet evaluated in the parameters.

How does the frequentist proceed then?

P.S.: I’ve been highlighting the expression ‘frequentist’, since with a Bayesian perspective, I don’t think the problem exists. Because with the use of MCMC algorithms, we’re always using numerical parameter values which will be used in the histogram for the posterior distribution.


I am not sure I understand the question. Numerically solving for the decision rules (conditional on the parameters) is always required, regardless of the estimation method. That is different from numerically optimizing an objective function (likelihood or posterior). Bayesians usually first do a mode-finding step before the MCMC, so essentially you are doing ML (if you had uniform priors)


In the frequentist perspective, we want to maximise the likelihood.
The likelihood is given by the Kalman filter, under some assumptions.
To use the kalman filter, we need to have the transition equation, i.e. we need to have the matrix N(\theta) above.
To know this matrix, we need to solve the system obtained from the log-linearization of the equilibrium conditions.
The problem is that the programs like Sims(2002) only solve numerically the system of linearized equations, giving a numerical matrix N, i.e., they give as output N(\theta) evaluated at some \theta values, and not an unevaluated N(\theta) which is what I think we would want for MLE.

So, how do we maximize the likelihood, when we only seem able to get L(\theta|Y) evaluated at some \theta and not at an unevaluated function?


As I wrote, you do numerical mode-finding, e.g. via a Newton algorithm. In a nutshell, you try different values of \theta and move in the direction in which the numerically computed density improves.


Ok. I think I got it. Thanks