Thread pool

General OpenMP discussion

Thread pool

Postby djo35 » Wed May 28, 2008 8:41 am

Earlier, ejd said that control of thread creation / destruction is implementation specific and OpenMP has no control over that - so my apologies if this is just rehashing over old ground!

I am trying to optimise functions which are repeatedly called from a shared library. Each call to a function is expensive due to the omp creation/destruction of threads. Unfortunately I can't abstract the #pragma omp parallel a level higher (out of the function) as each function in the library we call is well specified (and called from another language - python).

I imagine that if we could persuade OpenMP to 'pool' the threads for re-use at the end of omp parallel instead of destroying them then we would save a lot of time.

As I understand it, OpenMP gives us no individual control over thread creation / destruction. I've been through the gnu gcc manual and the intel icc manual with a fine tooth comb and found no control there either. Does anyone know of implementation which does have this control (or even better can some one point out where I'm being dumb....)?

Many thanks, djo

Re: Thread pool

Postby ejd » Wed May 28, 2008 9:20 am

Many implementations use "pools of threads", so the first parallel region takes a bit longer than the following regions. This is because when the first region is encountered is when the threads are actually "gotten". After that they are reused. This is one reason that the OpenMP spec has not had an OMP_SET_STACKSIZE call added to it - so threads could be reused. This of course is not the only way that OpenMP can be implemented.

The user has no way of knowing how the implementation of OpenMP is done in most cases. While the gnu manual doesn't state how it is done, the code is available to look at (though I have never done so). As for Intel, I believe they use a pool approach (from my conversations with Intel engineers in the past). You can look at the various literature on the web for more information (do a yahoo search on something like "+openmp +intel +pool"):

(see header Thread Pooling" on the following page:) ... atform.htm
(locate on pool in the following document:)

This seems to indicate that the Intel compiler does what you want.

I do know that Sun's implementation of OpenMP uses "thread pools" (since I work on it).
Posts: 1025
Joined: Wed Jan 16, 2008 7:21 am

Re: Thread pool

Postby djo35 » Wed May 28, 2008 10:25 am

Thanks for that, that was what I was after. I'd traced a performance hit with gcc to creation / destruction (around 30-40% of a loop execution time) but upon further investigation icc takes a mere ~5% creation / destruction time. Win Intel!

Now back to the battle... Thanks again, djo

Re: Thread pool

Postby jakub » Tue Jun 03, 2008 3:34 am

GCC also uses a pool of threads for non-nested parallels (given that it uses TLS for threadprivate, it even has to), so if you don't change the number of threads between two non-nested parallel regions, no new threads are created. Only gomp-3_0-branch (or various backports of the speedup changes) allow to select between passive and active waiting though (OMP_WAIT_POLICY and GOMP_SPINCOUNT env vars), so probably what you are seeing is just the overhead to wake all the threads that were sleeping on a futex.
Posts: 74
Joined: Fri Oct 26, 2007 3:19 am

Return to Using OpenMP

Who is online

Users browsing this forum: Google [Bot] and 2 guests