Collapse not working

General OpenMP discussion
Forum rules
The OpenMP Forums are now closed to new posts. Please visit Stack Overflow if you are in need of help:
Posts: 2
Joined: Thu Nov 26, 2020 2:38 pm

Collapse not working

Post by jmricher »

I am facing a problem and I can't figure out why it is not working.
I am working under Ubuntu 20.04 with g++ 10.2.0.

I have implemented the Game of Life and I have a function that I call 10000 times to let the board of cells evolve.
Basically the function consists in computing the output for each input cell of the board. THe cells_in and cells_out array are then exchanged in the next iteration. If a cell is alive I set a point in the image to a red color. The image can be displayed but it will not be the case for the problem I face.

Code: Select all

void kernel(CellType *cells_in, CellType *cells_out, ImagePoint *img, int size) {
	// don't take into account cells on the border
	int y, x;
	//#pragma omp parallel for num_threads(4) private(x) shared(cells_in,cells_out,img)
	//#pragma omp parallel for collapse(2) num_threads(4) shared(cells_in,cells_out,img)
	for (y=1; y<size-1; ++y) {
		for (x=1; x<size-1; ++x) {
			// input values
			// selects the 8 cells surrounding the current cell (c11)
			CellType c00, c01, c02; // line y-1
			CellType c10, c11, c12; // line y
			CellType c20, c21, c22; // line y+1

			c00 = cells_in[(y-1)*size + x-1];
			c01 = cells_in[(y-1)*size + x];
			c02 = cells_in[(y-1)*size + x+1];

			c10 = cells_in[(y)*size + x-1];
			c11 = cells_in[(y)*size + x];
			c12 = cells_in[(y)*size + x+1];

			c20 = cells_in[(y+1)*size + x-1];
			c21 = cells_in[(y+1)*size + x];
			c22 = cells_in[(y+1)*size + x+1];

			// compute number of alive cells around current cell
			int nbr = c00 + c01 + c02 + c10 + c12 + c20 + c21 + c22;

			// compute new state of current cell
			c11 = ((CellType)c11 & (nbr == 2)) | (nbr == 3);

			// set new state of current cell in cells_out
			cells_out[y*size + x] = c11;

			// update image with red (ALIVE) or black(DEAD)
			int offset = y*size+x;
			img[offset].red = c11 * 255;
			img[offset].green = 0;
			img[offset].blue = 0;
			img[offset].alpha = c11 * 255;

  • If I don't use OpenMP it takes 22 seconds to execute the code.
  • IIf I use OpenMP with "#pragma omp parallel for num_threads(4)" on the
    first loop (y), it takes 5.5 seconds
  • But if I use the collapse version "#pragma omp parallel for collapse(2) num_threads(4)"
    it runs in 36 seconds !!!
I was expecting to get approximately the same time as with the parallelisation on the
first loop.

So my question is why does it take so long with the collapse version ? I am missing something ?

I have checked the cache references and misses with 'perf' and it is the same for the
three implementations.
However the number of instructions executed is 1846 10^9 for the version with collapse that runs in 36 seconds, while for the two others I have only 158 10^9 instructions.

Best regards,

Posts: 2
Joined: Sat Feb 21, 2015 2:20 pm
Location: Portland, OR

Re: Collapse not working

Post by JeffHammond »

Can you provide a MCVE, ie something that can be compiled and run to reproduce your results? It's not practical to debug performance issues from the source of a dependent function.

The behavior of the collapse clause in OpenMP is counterintuitively bad in many cases, because it requires the generation of div+mod instructions. I don't know if that's the issue here.

Posts: 808
Joined: Thu Jan 08, 2009 10:12 am
Location: EPCC, University of Edinburgh

Re: Collapse not working

Post by MarkB »

I presume that the 1-thread performance of the collapsed version is also bad? It's quite rare, but I do see this kind of thing once in a while - the OpenMP directives cause the compiler optimisation to fail in some way. In this case it might be the address computations that are the problem. Sometimes you can fix it with relatively minor changes, such as making size a constant.

Posts: 2
Joined: Thu Nov 26, 2020 2:38 pm

Re: Collapse not working

Post by jmricher »

Here is the code that you can compile under Linux.

All explanations are in the README.txt file :

Code: Select all

Under Ubuntu 20.04, please install the following packages:

sudo apt install freeglut3 freeglut3-dev libopengl libopengl-dev


Compile with :

> make

Modify source code in 'gol_cpu.cpp' on line 57 or 58 for

#pragma omp parallel for


#pragma omp parallel for collapse(2)


Test with

time ./gol.exe -t -s 2048 -m 10000

with each possibility (Results on AMD Ryzen 5 3600) : 
- no openmp               : 21.92 s
- openmp but no collapse  :  5.45 s (total 21.72 s)
- openmp with collapse(2) : 35,19 s (total 2m20s)
(6.92 KiB) Downloaded 433 times