You still did not grasp the main point: the MDD will incorporate the fit of ALL moments, weighted by the precision they are estimated with. That may imply that the 200th autocorrelation between output and consumption is really well-matched. While that is the most efficient way of estimating parameters, it is often not what researchers care about. We often only care much about the first two to four autocorrelations and cross correlation.
Regarding estimated models: you either do that via moment matching or full information methods. But you pretty much never mix the two. When selecting features, i.e. doing model comparison, the MDD is the way to go.
The MDD penalizes for complexity. A pure moment matching could always achieve a perfect fit by just adding enough features and therefore parameters.