I have implemented the 3D Gauss-Seidel code from the OpenMP tutorial (Terboven & Klemm), and tested it with different compilers:
- gcc 10.2
- clang 11
- icc 2020 upd 4
clang and icc are slower with a single thread (can probably be tweaked with some optimization flags), and the scaling is showing the behaviour expected for this kind of loops, i.e. no speed-up for 2 threads, but then good speed-up for more threads, compared to the 2-threads case.
Does anybody have similar experiences, and maybe some knowledge how the different runtimes do implement the doacross loops?