 See

Usually, you pick the moments that you are most interested in. That’s the difference to full information estimation.

It depends on whether nonlinearities are important. If higher order does not work, there is often a problem with the scaling of shocks (scaling by 100). Also, often
pruning
helps. 
To compute the standard deviations, you need a positive definite Hessian at the mode. Looking at the
mode_check
pictures, that is not the case in your results, most likely due to a corner solution. 
If your model is overidentified, the J test ist for testing whether the model is rejected by the data.

Enable
options_.debug=true;
to see where the dots comes from. But given that the autocorrelation coefficients go to 1 (unit root), you most likely have a problem with the observation equations. The mean of the data and the matched model variables most likely differ.