Parallel estimation in Linux gets stuck after mode

I set up the configuration file and ran the parallel_test step with satisfactory results (Test for Cluster computation, computer localhost … Passed!) However, when I actually run the estimation, Dynare gets stuck after computing the mode and just before starting the MCMC chains. The last messages on the screen are:

Estimation::mcmc: Write details about the MCMC… Ok!
Estimation::mcmc: Details about the MCMC are available in SOE_Estimation/metropolis/SOE_Estimation_mh_history_0.mat

Matlab shows it is “Busy”, but nothing happens after (no window showing the progress of the MH algorithm). Without the parallel setup, the estimation runs fine. I connect remotely to a virtual machine to run Dynare 4.6.4. Any idea what could be going on?

Is the config file correctly set up, particularly with respect to the Dynare version?

Hi Johannes,

This is the config file:

Name = Local
Members = n1

Name = n1
ComputerName = localhost
CPUnbr = 12
NumberOfThreadsPerJob = 6

I would have presumed the parallel_test step would have picked up any mistakes, but it passes the test seamlessly. Thanks!

No, it does not test everything. Usually, you should also have

#path to matlab.exe; on Windows, the MATLAB bin folder is in the system path
#so we only need to provide the name of the exe file
#Dynare path you are using

Thanks, Johannes. I added those lines and after waiting for about 30 minutes after computing the mode, Matlab eventually showed the window with the progress of the MCMC chains (I am not sure the lines added made a difference because I had not waited that long before). This happened when I used fmincon for the mode (mode_compute=1). Because I wanted to use mode_compute=6, I restarted the estimation with this configuration. After about 3 hours, Matlab hadn’t shown the progress of the MCMC chains after successfully computing the mode (this is the configuration under which I wrote the present forum post). This is puzzling.

That is indeed strange. Is your model big? The time between mode-finding and the MCMC is typically used to compute the Hessian when using optimizers that do not directly provide it. That can take some time for big models.

I guess you killed the Dynare run when nothing changed for a long time. Do you know in which function it was stuck?

The model has 30ish coefficients and 15 observable variables. I am not sure that is a big model. I tried providing the mode and the covariance matrix obtained with mode_compute=6 (single core estimation), by doing mode_compute=0 and mode_file=model_name_mh_mode.mat, but Matlab spent the whole night stuck after reporting the mode. The message I get after stopping the program is the following:

Operation terminated by user during dos

In dynareParallelDelete (line 55)
        [~, ~] = system(['ssh ',ssh_token,username,Parallel(indPC).ComputerName,' ''/bin/bash --norc -c "rm -f ',directory,pname,fname,'"''']);

In masterParallel (line 205)

In posterior_sampler (line 150)
    [fout, nBlockPerCPU, totCPU] = masterParallel(options_.parallel, fblck, nblck,NamFileInput,'posterior_sampler_core', localVars, globalVars, options_.parallel_info);

In dynare_estimation_1 (line 474)

In dynare_estimation (line 105)

In SOE_Estimation.driver (line 1493)

In dynare (line 293)
evalin('base',[fname '.driver']) ;

So the code gets stuck when trying to delete the _output_*.mat-files. Could it be a problem with write permissions?

After experimenting a bit, we made the parallel estimation work in Linux by using 4.5.6. Thanks for the feedback.

That is strange. We experienced this issue in the past when trying to combine parallel estimation on different Dynare versions, i.e. the host had a different version than the respective nodes. That’s why the DynarePath was essential.

Correct. Actually, I was running on 4.5.6 and had forgotten to change the path in the configuration file (it was still 4.6.4). The results were bad. After aligning both to 4.5.6, things worked well.

I am having the same issue, the error is exactly the same as the one above. I only have 5.0 installed here. This is my conf file:



If I run parallel_test, I get the following error:

Testing computer -> localhost <- ...

Check on Local Variable ..... Ok!

Checking Hardware please wait ...
Hardware has 16 Cpu/Cores!
User requires 4 Cpu/Cores!
Warning! There are unused CPU's!

Test for Cluster computation, computer localhost ..... Passed!

AnalyseComputationalEnvironment returned with Error Code: 2.2

That is not an true error, just the notice that you are using fewer cores than are available.