Page 1 of 1

### #pragma omp for simd not vectorization?

Posted: Wed Aug 01, 2018 2:45 am
Hi,

I have question on #pragma omp for simd,

By running this code:

Code: Select all

``````    #include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include <omp.h>

# define NPOINTS 2000
# define MAXITER 2000

struct complex{
double real;
double imag;
};

int main(){
int i, j, iter, numoutside = 0;
double area, error, ztemp;
struct complex z, c;

#pragma omp parallel default(none) private(i,j,c,z,ztemp,iter) reduction(+:numoutside)
{
#pragma omp for simd schedule (dynamic) collapse (2)
for (i=0; i<NPOINTS; i++) {
for (j=0; j<NPOINTS; j++) {
c.real = -2.0+2.5*(double)(i)/(double)(NPOINTS)+1.0e-7;
c.imag = 1.125*(double)(j)/(double)(NPOINTS)+1.0e-7;
z=c;
for (iter=0; iter<MAXITER; iter++){
ztemp=(z.real*z.real)-(z.imag*z.imag)+c.real;
z.imag=z.real*z.imag*2+c.imag;
z.real=ztemp;
if ((z.real*z.real+z.imag*z.imag)>4.0e0) {
numoutside++;
break;
}
}
}
}
} // end of pragma

/*
*  Calculate area and error and output the results
*/

area=2.0*2.5*1.125*(double)(NPOINTS*NPOINTS-numoutside)/(double)(NPOINTS*NPOINTS);
error=area/(double)NPOINTS;

printf("Area of Mandlebrot set = %12.8f +/- %12.8f\n",area,error);

}
``````
With SIMD clause, I get even faster execution than using #pragma omp for only. Thus, I think that the chunk of iteration assigned to each thread in parallel has been run using SIMD instruction. Isn't that considered as vectorization?

However, by running this command at LInux terminal:
\$ gcc -fopenmp -g -O2 a.c -liomp5 -ftree-vectorize -fopt-info-vec-all -o a.out

I get the output which claims that the code is not vectorized as such:
Analyzing loop at a.c:29
a.c:29:21: note: ===== analyze_loop_nest =====
a.c:29:21: note: === vect_analyze_loop_form ===
a.c:29:21: note: not vectorized: control flow in loop.
a.c:20:13: note: vectorized 0 loops in function.
a.c:20:13: note: ===vect_slp_analyze_bb===
a.c:20:13: note: === vect_analyze_data_refs ===
a.c:20:13: note: not vectorized: not enough data-refs in basic block.
a.c:20:13: note: === vect_analyze_data_refs ===
a.c:20:13: note: not vectorized: not enough data-refs in basic block.
a.c:20:13: note: ===vect_slp_analyze_bb===
a.c:20:13: note: ===vect_slp_analyze_bb===
a.c:20:13: note: === vect_analyze_data_refs ===
a.c:20:13: note: got vectype for stmt: .istart0.13_16 = .istart0.11;
vector(2) long int
a.c:20:13: note: got vectype for stmt: .iend0.14_18 = .iend0.12;
vector(2) long int
a.c:20:13: note: === vect_analyze_data_ref_accesses ===
a.c:20:13: note: not consecutive access .istart0.13_16 = .istart0.11;
a.c:20:13: note: not consecutive access .iend0.14_18 = .iend0.12;
a.c:20:13: note: not vectorized: no grouped stores in basic block.
a.c:20:13: note: ===vect_slp_analyze_bb===
a.c:20:13: note: ===vect_slp_analyze_bb===
a.c:26:55: note: === vect_analyze_data_refs ===
a.c:26:55: note: not vectorized: not enough data-refs in basic block.
a.c:26:55: note: ===vect_slp_analyze_bb===
a.c:26:55: note: === vect_analyze_data_refs ===
a.c:26:55: note: not vectorized: not enough data-refs in basic block.
a.c:26:55: note: ===vect_slp_analyze_bb===
a.c:26:55: note: ===vect_slp_analyze_bb===
a.c:26:55: note: ===vect_slp_analyze_bb===
a.c:26:55: note: ===vect_slp_analyze_bb===
a.c:32:11: note: === vect_analyze_data_refs ===
a.c:32:11: note: not vectorized: not enough data-refs in basic block.
a.c:32:11: note: ===vect_slp_analyze_bb===
a.c:28:11: note: === vect_analyze_data_refs ===
a.c:28:11: note: not vectorized: not enough data-refs in basic block.
a.c:28:11: note: ===vect_slp_analyze_bb===
a.c:33:20: note: === vect_analyze_data_refs ===
a.c:33:20: note: not vectorized: not enough data-refs in basic block.
a.c:33:20: note: ===vect_slp_analyze_bb===
a.c:33:20: note: === vect_analyze_data_refs ===
a.c:33:20: note: not vectorized: not enough data-refs in basic block.
a.c:33:20: note: ===vect_slp_analyze_bb===
a.c:33:20: note: ===vect_slp_analyze_bb===
a.c:33:20: note: === vect_analyze_data_refs ===
a.c:33:20: note: not vectorized: not enough data-refs in basic block.
a.c:33:20: note: ===vect_slp_analyze_bb===
a.c:33:20: note: === vect_analyze_data_refs ===
a.c:33:20: note: not vectorized: not enough data-refs in basic block.
a.c:20:77: note: === vect_analyze_data_refs ===
a.c:20:77: note: not vectorized: not enough data-refs in basic block.
a.c:20:77: note: === vect_analyze_data_refs ===
a.c:20:77: note: not vectorized: not enough data-refs in basic block.
a.c:15:9: note: vectorized 0 loops in function.
a.c:15:9: note: ===vect_slp_analyze_bb===
a.c:20:13: note: === vect_analyze_data_refs ===
a.c:20:13: note: got vectype for stmt: .omp_data_o.8.numoutside = 0;
vector(4) int
a.c:20:13: note: not vectorized: not enough data-refs in basic block.
/usr/include/x86_64-linux-gnu/bits/stdio2.h:104:10: note: === vect_analyze_data_refs ===
/usr/include/x86_64-linux-gnu/bits/stdio2.h:104:10: note: got vectype for stmt: numoutside_4 = .omp_data_o.8.numoutside;
vector(4) int
/usr/include/x86_64-linux-gnu/bits/stdio2.h:104:10: note: not vectorized: not enough data-refs in basic block.
a.c:15:9: note: === vect_analyze_data_refs ===
a.c:15:9: note: not vectorized: not enough data-refs in basic block.

Inserting #pragma omp for simd in the code does not necessarily bring vectorization? @.@

Clarification is needed. Thanks.

Best Regards,
Claudia

### Re: #pragma omp for simd not vectorization?

Posted: Mon Aug 13, 2018 9:38 am
The presence of the inner while loop will likely prevent any vectorisation of the outer nest.
I see no difference in performance with and without the simd directive (gcc 6.2.0 on Intel E5-2695, 1 OpenMP thread).
How are you measuring the execution time?

### Re: #pragma omp for simd not vectorization?

Posted: Fri Aug 31, 2018 3:45 am
If you have an Intel the effect of -mavx in GCC is very clear on the code.
My i7-8700 was around 100% faster with -mavx.