I am very new into Bayesian estimation of DSGEs, I have never done it before even though I understand the concept well. However, I have two (rather) basic questions

I know that posterior = likelihood * prior (up to a constant) I understand what the prior is, and I understand the resulting posterior. I have some trouble understanding over which data the likelihood function (LF) is computed. I know it is L(y/theta) where y is the data and theta is a vector of parametres. The data y is what exactly? Artificial, model-generated data from the state-space representation of the model for a given theta? Or actual data (then why given theta?) Or some combination of the two, such that the LF is a kind of a minimum-distance estimator. I am getting confused there…

Second is about identification, but this is more conceptual. Sometimes I read in papers that the number of shocks should not exceed the data series used in the estimation. Or that a given shock is not identified. How do we know that exactly?

The data is empirically observed data. The likelihood is the likelihood of observing the actual data realizations given the current model (this is an implicit conditioning you are missing) and the current parameter vector. If you forget about Bayesian estimation for a second, the goal of maximum likelihood estimation would be to find the values for the parameters of the model that make observing the empirical data the most likely. Consider an AR1-process

where epsilon is standard normal. The goal is to find rho and sigma_eps. Given any draw for theta=[rho;sigma_eps] the likelihood of observing x can be evaluated as

To evaluate that function, you need the parameter vector theta.

The only restriction is that you cannot have fewer shocks than observables as the model would be stochastically singular. Having more shocks is generally not an issue. Estimation means finding the maximum/mode of the posterior. Usually you do that by finding a point where the first derivative is 0 (first order condition). This point should be the maximum. But there might be cases for various reasons where the derivative of the likelihood function w.r.t a particular parameter is 0 not only at the maximum but for the whole parameter range. In that case the maximum cannot be found and the parameter is not identified (i.e. the zero derivative does not only hold at the true value). This can be easily checked by looking at the Jacobian of the model. The bigger problem is weak identification where the likelihood function is not horizontal in a particular parameter direction but where it is very flat. In that case it will be hard to find the actual maximum. You might want to take a look at the references pointed out here [Quick Help)

Thanks a lot for your reply. A couple of issues to clarify a bit more.

So let me see if I get this straight. The solution of a (log-linearised) DSGE model is:

y = A(theta)*y(-1) + B(theta)*u

where vectors A and B are functions of underlying model parameters theta.

Talking in terms of likelihood (not Bayesian) the goal is to get estimates of A and B - and consequently the vector theta - that maximise the LF. This can be done using the Kalman Filter for example.

What do you mean exactly “The only restriction is that you cannot have less parameters than observables as the model would be stochastically singular.” In general the parameters estimated are more than the observables right?