I have question on #pragma omp for simd,

By running this code:

- Code: Select all
`#include <stdio.h>`

#include <stdlib.h>

#include <math.h>

#include <omp.h>

# define NPOINTS 2000

# define MAXITER 2000

struct complex{

double real;

double imag;

};

int main(){

int i, j, iter, numoutside = 0;

double area, error, ztemp;

struct complex z, c;

#pragma omp parallel default(none) private(i,j,c,z,ztemp,iter) reduction(+:numoutside)

{

#pragma omp for simd schedule (dynamic) collapse (2)

for (i=0; i<NPOINTS; i++) {

for (j=0; j<NPOINTS; j++) {

c.real = -2.0+2.5*(double)(i)/(double)(NPOINTS)+1.0e-7;

c.imag = 1.125*(double)(j)/(double)(NPOINTS)+1.0e-7;

z=c;

for (iter=0; iter<MAXITER; iter++){

ztemp=(z.real*z.real)-(z.imag*z.imag)+c.real;

z.imag=z.real*z.imag*2+c.imag;

z.real=ztemp;

if ((z.real*z.real+z.imag*z.imag)>4.0e0) {

numoutside++;

break;

}

}

}

}

} // end of pragma

/*

* Calculate area and error and output the results

*/

area=2.0*2.5*1.125*(double)(NPOINTS*NPOINTS-numoutside)/(double)(NPOINTS*NPOINTS);

error=area/(double)NPOINTS;

printf("Area of Mandlebrot set = %12.8f +/- %12.8f\n",area,error);

}

With SIMD clause, I get even faster execution than using #pragma omp for only. Thus, I think that the chunk of iteration assigned to each thread in parallel has been run using SIMD instruction. Isn't that considered as vectorization?

However, by running this command at LInux terminal:

$ gcc -fopenmp -g -O2 a.c -liomp5 -ftree-vectorize -fopt-info-vec-all -o a.out

I get the output which claims that the code is not vectorized as such:

Analyzing loop at a.c:29

a.c:29:21: note: ===== analyze_loop_nest =====

a.c:29:21: note: === vect_analyze_loop_form ===

a.c:29:21: note: not vectorized: control flow in loop.

a.c:29:21: note: bad loop form.

a.c:20:13: note: vectorized 0 loops in function.

a.c:20:13: note: ===vect_slp_analyze_bb===

a.c:20:13: note: === vect_analyze_data_refs ===

a.c:20:13: note: not vectorized: not enough data-refs in basic block.

a.c:20:13: note: === vect_analyze_data_refs ===

a.c:20:13: note: not vectorized: not enough data-refs in basic block.

a.c:20:13: note: ===vect_slp_analyze_bb===

a.c:20:13: note: ===vect_slp_analyze_bb===

a.c:20:13: note: === vect_analyze_data_refs ===

a.c:20:13: note: got vectype for stmt: .istart0.13_16 = .istart0.11;

vector(2) long int

a.c:20:13: note: got vectype for stmt: .iend0.14_18 = .iend0.12;

vector(2) long int

a.c:20:13: note: === vect_analyze_data_ref_accesses ===

a.c:20:13: note: not consecutive access .istart0.13_16 = .istart0.11;

a.c:20:13: note: not consecutive access .iend0.14_18 = .iend0.12;

a.c:20:13: note: not vectorized: no grouped stores in basic block.

a.c:20:13: note: ===vect_slp_analyze_bb===

a.c:20:13: note: ===vect_slp_analyze_bb===

a.c:26:55: note: === vect_analyze_data_refs ===

a.c:26:55: note: not vectorized: not enough data-refs in basic block.

a.c:26:55: note: ===vect_slp_analyze_bb===

a.c:26:55: note: === vect_analyze_data_refs ===

a.c:26:55: note: not vectorized: not enough data-refs in basic block.

a.c:26:55: note: ===vect_slp_analyze_bb===

a.c:26:55: note: ===vect_slp_analyze_bb===

a.c:26:55: note: ===vect_slp_analyze_bb===

a.c:26:55: note: ===vect_slp_analyze_bb===

a.c:32:11: note: === vect_analyze_data_refs ===

a.c:32:11: note: not vectorized: not enough data-refs in basic block.

a.c:32:11: note: ===vect_slp_analyze_bb===

a.c:28:11: note: === vect_analyze_data_refs ===

a.c:28:11: note: not vectorized: not enough data-refs in basic block.

a.c:28:11: note: ===vect_slp_analyze_bb===

a.c:33:20: note: === vect_analyze_data_refs ===

a.c:33:20: note: not vectorized: not enough data-refs in basic block.

a.c:33:20: note: ===vect_slp_analyze_bb===

a.c:33:20: note: === vect_analyze_data_refs ===

a.c:33:20: note: not vectorized: not enough data-refs in basic block.

a.c:33:20: note: ===vect_slp_analyze_bb===

a.c:33:20: note: ===vect_slp_analyze_bb===

a.c:33:20: note: === vect_analyze_data_refs ===

a.c:33:20: note: not vectorized: not enough data-refs in basic block.

a.c:33:20: note: ===vect_slp_analyze_bb===

a.c:33:20: note: === vect_analyze_data_refs ===

a.c:33:20: note: not vectorized: not enough data-refs in basic block.

a.c:20:77: note: === vect_analyze_data_refs ===

a.c:20:77: note: not vectorized: not enough data-refs in basic block.

a.c:20:77: note: === vect_analyze_data_refs ===

a.c:20:77: note: not vectorized: not enough data-refs in basic block.

a.c:15:9: note: vectorized 0 loops in function.

a.c:15:9: note: ===vect_slp_analyze_bb===

a.c:20:13: note: === vect_analyze_data_refs ===

a.c:20:13: note: got vectype for stmt: .omp_data_o.8.numoutside = 0;

vector(4) int

a.c:20:13: note: not vectorized: not enough data-refs in basic block.

/usr/include/x86_64-linux-gnu/bits/stdio2.h:104:10: note: === vect_analyze_data_refs ===

/usr/include/x86_64-linux-gnu/bits/stdio2.h:104:10: note: got vectype for stmt: numoutside_4 = .omp_data_o.8.numoutside;

vector(4) int

/usr/include/x86_64-linux-gnu/bits/stdio2.h:104:10: note: not vectorized: not enough data-refs in basic block.

a.c:15:9: note: === vect_analyze_data_refs ===

a.c:15:9: note: not vectorized: not enough data-refs in basic block.

Inserting #pragma omp for simd in the code does not necessarily bring vectorization? @.@

Clarification is needed. Thanks.

Best Regards,

Claudia