simdlen considered harmful

Comments and discussion of the 4.1 OpenMP Draft specifications. Comment period ends September 30, 2015. See to download the specifications. (Read Only)
Forum rules
The OpenMP Forums are now closed to new posts. Please visit Stack Overflow if you are in need of help:
Posts: 73
Joined: Mon Jul 27, 2015 4:50 pm

simdlen considered harmful

Post by fewl9012 »

(harmful to program portability).
Consider a simple loop

Code: Select all

#pragma omp simd
for(i=0;i<n;++i) y[i] = a*x[i] + y[i];
where a, x, y are float. When writing this for x86 with SSE, I might use simdlen(4). But now we have AVX, so I should use simdlen(8). Next year, I'll have AVX512, so I should write simdlen(16). Maybe I also want to run this on ARM with NEON, which would be simdlen(2). I wonder what the use case was for adding this clause. Is there a case where a programmer would NOT want to use the longest available SIMD instructions? If the programmer wants to avoid AVX512 for portability, that's more of a compiler flag than a program option.

I'm sure someone will say that the exact SIMD chunk size is implementation-dependent. The Features History section claims "The simdlen clause was added to the simd construct to support specification of the exact number of iterations desired per SIMD chunk." So someone thinks the programmer wants to specify an exact number. I'm not seeing the motivation.

Posts: 12
Joined: Tue Jul 14, 2015 9:35 am

Re: simdlen considered harmful

Post by Spreis »

There are 2 uses for simdlen IMO:
1. To match #pragma omp declare simd. The simdlen for the latter is well-defined and if one need would like to strictly ensure match caller to callee by simdlen() he/she should have ability to do so.
2. If loop operates on multiple types one may decide what is better: under-utilization of registers for shorter type or over-utilization for longer type.

I agree that simdlen() creates dependency on target which negatively affects portability. I personally would prefer having simdlen() defined by type, not by exact value (something like simdlenfortype(float)), but this was not the case for OpenMP 4.0 (for omp decalre simd) and definition for loop constructs in just consistent with previous one. The behavior I'd like to see may be emulated with the set of defines like below:

Code: Select all

#if definde (__AVX512F__)
#define  VREG_SZ  64
#else if defined (__AVX__)
#define  VREG_SZ  32
#else if defined (__SSE2__)
#define  VREG_SZ  16
#else if (defined (__ARM_NEON__) && defined (__aarch64__))
#define  VREG_SZ  16
#else if defined (__ARM_NEON__)
#define  VREG_SZ  8
#define simdlenfortype(TYPE)  

#ifndef simdlenfortype
#define simdlenfortype(TYPE)   simdlen(VREG_SZ/sizeof(TYPE))