#pragma omp for simd not vectorization?

General OpenMP discussion

#pragma omp for simd not vectorization?

Postby ClaudiaWhite » Wed Aug 01, 2018 2:45 am

Hi,

I have question on #pragma omp for simd,

By running this code:
Code: Select all
    #include <stdio.h>
    #include <stdlib.h>
    #include <math.h>
    #include <omp.h>

    # define NPOINTS 2000
    # define MAXITER 2000


    struct complex{
      double real;
      double imag;
    };

    int main(){
      int i, j, iter, numoutside = 0;
      double area, error, ztemp;
      struct complex z, c;

    #pragma omp parallel default(none) private(i,j,c,z,ztemp,iter) reduction(+:numoutside)
      {
     #pragma omp for simd schedule (dynamic) collapse (2)
      for (i=0; i<NPOINTS; i++) {
        for (j=0; j<NPOINTS; j++) {
          c.real = -2.0+2.5*(double)(i)/(double)(NPOINTS)+1.0e-7;
          c.imag = 1.125*(double)(j)/(double)(NPOINTS)+1.0e-7;
          z=c;
          for (iter=0; iter<MAXITER; iter++){
       ztemp=(z.real*z.real)-(z.imag*z.imag)+c.real;
       z.imag=z.real*z.imag*2+c.imag;
       z.real=ztemp;
       if ((z.real*z.real+z.imag*z.imag)>4.0e0) {
         numoutside++;
         break;
       }
          }
        }
      }
      } // end of pragma   


    /*
    *  Calculate area and error and output the results
    */

          area=2.0*2.5*1.125*(double)(NPOINTS*NPOINTS-numoutside)/(double)(NPOINTS*NPOINTS);
          error=area/(double)NPOINTS;

          printf("Area of Mandlebrot set = %12.8f +/- %12.8f\n",area,error);

      }


With SIMD clause, I get even faster execution than using #pragma omp for only. Thus, I think that the chunk of iteration assigned to each thread in parallel has been run using SIMD instruction. Isn't that considered as vectorization?

However, by running this command at LInux terminal:
$ gcc -fopenmp -g -O2 a.c -liomp5 -ftree-vectorize -fopt-info-vec-all -o a.out

I get the output which claims that the code is not vectorized as such:
Analyzing loop at a.c:29
a.c:29:21: note: ===== analyze_loop_nest =====
a.c:29:21: note: === vect_analyze_loop_form ===
a.c:29:21: note: not vectorized: control flow in loop.
a.c:29:21: note: bad loop form.
a.c:20:13: note: vectorized 0 loops in function.
a.c:20:13: note: ===vect_slp_analyze_bb===
a.c:20:13: note: === vect_analyze_data_refs ===
a.c:20:13: note: not vectorized: not enough data-refs in basic block.
a.c:20:13: note: === vect_analyze_data_refs ===
a.c:20:13: note: not vectorized: not enough data-refs in basic block.
a.c:20:13: note: ===vect_slp_analyze_bb===
a.c:20:13: note: ===vect_slp_analyze_bb===
a.c:20:13: note: === vect_analyze_data_refs ===
a.c:20:13: note: got vectype for stmt: .istart0.13_16 = .istart0.11;
vector(2) long int
a.c:20:13: note: got vectype for stmt: .iend0.14_18 = .iend0.12;
vector(2) long int
a.c:20:13: note: === vect_analyze_data_ref_accesses ===
a.c:20:13: note: not consecutive access .istart0.13_16 = .istart0.11;
a.c:20:13: note: not consecutive access .iend0.14_18 = .iend0.12;
a.c:20:13: note: not vectorized: no grouped stores in basic block.
a.c:20:13: note: ===vect_slp_analyze_bb===
a.c:20:13: note: ===vect_slp_analyze_bb===
a.c:26:55: note: === vect_analyze_data_refs ===
a.c:26:55: note: not vectorized: not enough data-refs in basic block.
a.c:26:55: note: ===vect_slp_analyze_bb===
a.c:26:55: note: === vect_analyze_data_refs ===
a.c:26:55: note: not vectorized: not enough data-refs in basic block.
a.c:26:55: note: ===vect_slp_analyze_bb===
a.c:26:55: note: ===vect_slp_analyze_bb===
a.c:26:55: note: ===vect_slp_analyze_bb===
a.c:26:55: note: ===vect_slp_analyze_bb===
a.c:32:11: note: === vect_analyze_data_refs ===
a.c:32:11: note: not vectorized: not enough data-refs in basic block.
a.c:32:11: note: ===vect_slp_analyze_bb===
a.c:28:11: note: === vect_analyze_data_refs ===
a.c:28:11: note: not vectorized: not enough data-refs in basic block.
a.c:28:11: note: ===vect_slp_analyze_bb===
a.c:33:20: note: === vect_analyze_data_refs ===
a.c:33:20: note: not vectorized: not enough data-refs in basic block.
a.c:33:20: note: ===vect_slp_analyze_bb===
a.c:33:20: note: === vect_analyze_data_refs ===
a.c:33:20: note: not vectorized: not enough data-refs in basic block.
a.c:33:20: note: ===vect_slp_analyze_bb===
a.c:33:20: note: ===vect_slp_analyze_bb===
a.c:33:20: note: === vect_analyze_data_refs ===
a.c:33:20: note: not vectorized: not enough data-refs in basic block.
a.c:33:20: note: ===vect_slp_analyze_bb===
a.c:33:20: note: === vect_analyze_data_refs ===
a.c:33:20: note: not vectorized: not enough data-refs in basic block.
a.c:20:77: note: === vect_analyze_data_refs ===
a.c:20:77: note: not vectorized: not enough data-refs in basic block.
a.c:20:77: note: === vect_analyze_data_refs ===
a.c:20:77: note: not vectorized: not enough data-refs in basic block.
a.c:15:9: note: vectorized 0 loops in function.
a.c:15:9: note: ===vect_slp_analyze_bb===
a.c:20:13: note: === vect_analyze_data_refs ===
a.c:20:13: note: got vectype for stmt: .omp_data_o.8.numoutside = 0;
vector(4) int
a.c:20:13: note: not vectorized: not enough data-refs in basic block.
/usr/include/x86_64-linux-gnu/bits/stdio2.h:104:10: note: === vect_analyze_data_refs ===
/usr/include/x86_64-linux-gnu/bits/stdio2.h:104:10: note: got vectype for stmt: numoutside_4 = .omp_data_o.8.numoutside;
vector(4) int
/usr/include/x86_64-linux-gnu/bits/stdio2.h:104:10: note: not vectorized: not enough data-refs in basic block.
a.c:15:9: note: === vect_analyze_data_refs ===
a.c:15:9: note: not vectorized: not enough data-refs in basic block.

Inserting #pragma omp for simd in the code does not necessarily bring vectorization? @.@

Clarification is needed. Thanks.

Best Regards,
Claudia
ClaudiaWhite
 
Posts: 7
Joined: Tue Jul 04, 2017 7:44 am

Re: #pragma omp for simd not vectorization?

Postby MarkB » Mon Aug 13, 2018 9:38 am

The presence of the inner while loop will likely prevent any vectorisation of the outer nest.
I see no difference in performance with and without the simd directive (gcc 6.2.0 on Intel E5-2695, 1 OpenMP thread).
How are you measuring the execution time?
MarkB
 
Posts: 768
Joined: Thu Jan 08, 2009 10:12 am
Location: EPCC, University of Edinburgh

Re: #pragma omp for simd not vectorization?

Postby Oldboy » Fri Aug 31, 2018 3:45 am

If you have an Intel the effect of -mavx in GCC is very clear on the code.
My i7-8700 was around 100% faster with -mavx.
Oldboy
 
Posts: 25
Joined: Wed Oct 31, 2012 2:39 am


Return to Using OpenMP

Who is online

Users browsing this forum: No registered users and 7 guests