Big performance differences between x86-64 and ARM?

dh.murakami · October 4, 2024, 9:54am

Just a quick question to the Dynare team: could you double check if there are any large performance differences during the mode finding stage of estimation (with OccBin) between x86-64 and ARM versions of macOS+MATLAB for mode_compute=5?

I am well aware of the raw instruction-per-clock performance differences between Intel’s CPUs (especially pre Alder Lake-based CPU architectures) and Apple’s M-series CPUs. However, mode finding that takes 3-6 hours on my M1-based machine takes literal days on my Intel machine. They both have a similar number of CPU cores (4E+4P Apple M1 vs 10C,20T Intel Core i9-10910). Each core on the Intel CPU is slower than the M1, but still the performance gap just in terms of hardware isn’t this big.

I’m wondering if the x86-64 version of Dynare is having some kind of suboptimal compilation issue? Has anyone else noticed this on their own computers?

jpfeifer · October 4, 2024, 10:57am

That seems like a question for @wmutschl and @sebastien

wmutschl · October 15, 2024, 7:43pm

I think the short answer: the M chips are really good and much better and have full priority from Apple.

There could indeed be several factors at play here:

Compilation Optimization: Some of Dynare‘s dependencies might not be optimized as well for x86-64 architecture (Intel), especially on macOS. Since Apple has prioritized ARM (M1/M2/M3/M4) chips with their latest OS updates and compilers, it’s possible that the Intel version is not as optimized, particularly for mode-finding algorithms which might be sensitive to compiler-level optimizations.
MATLAB and BLAS Libraries: MATLAB’s performance can vary between different architectures, particularly due to the underlying math libraries it uses (e.g., Intel MKL vs Apple’s Accelerate framework), which can slow down large matrix operations that are common in mode-finding routines.
Parallelism Issues: Although your Intel Core i9-10910 has more threads overall, it’s possible that the mode-finding process doesn’t utilize them efficiently. The M1’s architecture, with its unified memory and highly efficient cores, might allow MATLAB and Dynare to make better use of available hardware resources. On the other hand, threading and parallelism performance on older Intel architectures might not be as well-optimized, leading to higher latency.
Thermal Throttling: Intel chips, particularly older generations, are prone to thermal throttling under heavy loads. If your Intel machine is hitting thermal limits, it could significantly reduce its performance during sustained workloads like mode finding. The M chips, due to their efficiency, may avoid this issue and maintain consistent performance over long periods.
macOS vs Intel on Linux or Windows: If you are using macOS on both machines, it might be worth comparing the performance on an Intel machine running Linux, where optimizations for Intel’s architecture could be more robust.

In short: the M chips are just great

dh.murakami · November 15, 2024, 10:13am

I think I will pull the trigger on one of the M4-based Macs!

One more thing @wmutschl: Do you have any general advice for parallel options when doing Bayesian estimation in Dynare? How do you personally like to setup the number of, say, RWMH chains relative to the number of CPU cores you have and the NumberOfThreadsPerJob in the parallel config file?