constant memory allocation to each thread

General OpenMP discussion

constant memory allocation to each thread

Postby OmriH » Wed Jun 27, 2018 2:46 am

Hi everyone,

I'm trying to improve the performance of my OpenMP code (x2.5 speed up factor with 20 cores =/ ).
One of the suspects for the poor performance is the construction / destruction of temp arrays every time step.

The code looks like:


for (t=0; t<Tmax; t=t+dt){

/*Do some stuff here*/

#pragma omp parallel
{
double *arr_local1;
arr_local1 = new double [10000];

/* Do a lot of stuff here */

delete arr_local1* ;
}

/*Do some stuff here*/

}

Is there any option to make each thread "remember" arr_local for each t iteration, instead of redefining it every iteration?

From my experience, if use "padding", i.e. I define a one big shared "arr1" for all of the threads (add another one dimension for the threads), that causes an even poorer performance (probably because all of the threads are trying to access the array at once).
A possibly better solution to my opinion is to define "arr_local1" for each thread, but do it only once (outside the for t loop)....is it possible to make a private array for each thread, but make it remember the array after the thread is (temporarily) closed?

Thanks!
OmriH
 
Posts: 2
Joined: Fri Nov 11, 2016 4:25 am

Re: constant memory allocation to each thread

Postby MarkB » Sun Jul 08, 2018 9:01 pm

There are several options here.

You could simply declare the array in place as

double arr_local1[10000];

This will allocate the memory from each thread's' stack, which is essentially zero cost. Using the C99 standard, this still works even if the size is not known at compile time. If the arrays are big enough, there is a risk of stack overflow, so you may need to increase the stack size for both the master thread and for the worker threads.

The one big shared array approach should also be fine (if a bit ugly), provided you get the dimension ordering correct, i.e

arr1[numthreads][10000]

and not

arr1[10000][numthreads]

You can use the omp_get_max_threads() function to safely determine how much memory to allocate for the subsequent parallel region.

A further option is to make the array global and declare it in a #pragma omp threadprivate directive, which gives each thread its own copy.
MarkB
 
Posts: 768
Joined: Thu Jan 08, 2009 10:12 am
Location: EPCC, University of Edinburgh


Return to Using OpenMP

Who is online

Users browsing this forum: No registered users and 8 guests