Is it better to parallelize the outer loop?

General OpenMP discussion

Is it better to parallelize the outer loop?

Postby daviddoria » Fri Sep 07, 2012 6:57 am

I have a nested loop like this:

Code: Select all

float DistanceFunction(object, object)
  sum = 0;
  for(some loop conditions)
    sum = += something;
  return sum;

#pragma omp parallel for
//    for(ForwardIteratorType current = first; current != last; ++current) // OpenMP 3 doesn't allow != in the loop ending condition
    for(ForwardIteratorType current = first; current < last; ++current)
      float d = DistanceFunction(*current, query);

      #pragma omp critical // Without this there are weird crashes

Is it best practice (i.e. fastest) to parallelize (add the "#pragma omp parallel for") to only this outer loop (the for loop with the iterator)? Or would it help anything to *also* parallelize the loop inside DistanceFunction? Or should I *only* have parallelized the loop inside DistanceFunction?



Re: Is it better to parallelize the outer loop?

Postby MarkB » Mon Sep 10, 2012 3:17 am

Hi David,

In general it should be fastest to parallelise the outer loop only, provided there is sufficient parallelism to keep the threads busy and load balanced. Parallelising the inner loop only adds an overhead for every parallel region encountered which (although dependent on the implementation and the number of threads) is typically of the order of tens of microseconds. (On the other hand it avoids the overhead of entering/exiting the critical region, and likely also of some cache invalidations to some data in outputQueue, but this unlikely to outweigh the parallel region overhead). Parallelising both loops is unlikely to be the most efficient solution, except maybe in some corner cases.

Hope that helps,
Posts: 746
Joined: Thu Jan 08, 2009 10:12 am
Location: EPCC, University of Edinburgh

Re: Is it better to parallelize the outer loop?

Postby mwolfe » Thu Sep 20, 2012 4:20 pm

Sometimes, if the outer loop doesn't have much parallelism but inner loop does, another option is to put the #pragma omp parallel outside the outer loop, and the #pragma omp for before the inner loop. This amortizes the overhead of creating the parallelism outside the outer loop, runs the outer loop 'redundantly' on all threads, and work-shares the iterations of the inner loop, which can be the better than parallelizing either the outer or the inner loop alone.

Return to Using OpenMP

Who is online

Users browsing this forum: No registered users and 6 guests