Hi,
Thanks for the development work.
I’d like to raise an efficiency suggestion regarding parallelisation, though it might also be a configuration issue on my side.
When running MH chains, the parallel procedure is used again to compute posterior moments. I’ve noticed that the workload distribution may not be optimal. For example, I have 24 cores and a model with 44 parameters. What happens is that 23 cores each get assigned a single parameter, which they process very quickly, while one core ends up handling the remaining 21 parameters, which takes significantly longer.
Unless I’ve missed a setting, it seems the work could be distributed more evenly—for instance, by ensuring each core handles at most two parameters, so no single core is left with the bulk of the task. That would likely improve efficiency compared to the current setup.
Hopefully, this observation is useful.
I mention,@wmutschl , as I guess, you are an expert in this:)