Page 1 of 1

Schedule and Chunk Size

Posted: Mon Mar 10, 2008 7:41 am
by howa

Are there any best practices for which type of Schedule and Chunk Size should be used under some suitations?

Any resources/article I should read for optimizing these factors?


Re: Schedule and Chunk Size

Posted: Mon Mar 10, 2008 8:39 am
by ejd
I have never seen an article that really discusses this. The problem is that it really depends on your workload and what you are trying to do.

A static schedule is good if the work you have can be divided up pretty evenly (e.g., so all the processors you are using are busy for about the same amount of time). Most of the overhead work of static is done at compile time.

Dynamic is good if your workload is uneven, since one or more processors might be busy for a long time. It allows the processors with small workloads to go after other chunks of work and hopefully balance the work out between processors. It has a larger overhead at runtime, since work has to be taken off of a queue.

Guided scheduling is good if you know the workload and can divide it up into pieces so as to balance the workload across the processors in accordance with the chunk sizes. Since the chunks of work decrease in size, it would be good to use when the initial chunks have a small amount of work per iteration and later chunks have a larger amount of work per iteration. The idea again, is so all the processors being used are busy for (hopefully) the same amount of time. Like dynamic, there is more overhead at runtime.

So basically the idea of having various schedules and chunk sizes, is to be able to keep all of the processors busy for about the same amount of time, so that you maximize parallelism. If the compiler could determine the amount of work being done in a loop, it could divide the work up and the user wouldn't have to play with it at all. Unfortunately, there is only so much that can be done at compile time, so this is something the user may have to play with to get the best performance.

Some compilers have other schedules as well, to allow the user even more flexibility depending on the workload distribution in the iterations. The OpenMP V3 spec adds the the schedule type of AUTO - which says that the compiler and runtime should do the best that they can. Initially implementations will most likely set this to some particular schedule above. Later on, this hopefully will try and look at the workload distribution at runtime and change the schedule dynamically to what is the "best" (using feedback from one run to influence the next).

I know this hasn't answered your question, but it is the best that I can do. Hopefully someone else has some helpful hints.

Re: Schedule and Chunk Size

Posted: Mon Mar 10, 2008 8:07 pm
by howa
Hi ejd,

Thanks for your reply, it is very well explained.

So asumming if each process share same ammount of work loads, then static would be good enough.

So how to adjust for the chunk size? It is related to the input size?

For example, Matrix multiplication: ... ode30.html

Is the optimal chunk size related to the size of matrix to be calculated?

Thanks again.

Re: Schedule and Chunk Size

Posted: Wed Mar 12, 2008 7:26 am
by ejd
Personally I have not seen a benefit to changing the chunk size in most cases.

The default for static if not specified, is to divide the iteration space into chunks that are approximately equal in size so that each thread is assigned one chunk. The idea here is so the workload given to each thread is hopefully the same. This of course is dependent on whether the same amount of work is being done for each iteration. You will also notice that since each thread gets one chunk, you don't have to go back and get another chunk, so the overhead is kept low.

For dynamic, if no chunk size is specified, it defaults to 1. This is a pretty low value and would almost guarantee that there is going to be more overhead by threads going back for more work. On the other hand, you would use dynamic when you have arying size workloads for each iteration, so that the work can be distributed more evenly. Here increasing the chunk size might be of benefit depending on the work loads, since you could decrease the overhead of going back for more work. However, you could also cause an imbalance if not careful - the same imbalance that you are trying to avoid by using dynamic.

For guided, chunk size is different. Here the chunks are proportional to the unassigned iterations divided by the number of threads and chunk size is used to determine the minimum size of a chunk. As such, it affects the proportion and has a definite impact. In fact, many vendors allow some other "tweaking" of this calculation by having an implementation specific environment variable that affects the calculation of chunk size.

For your specific question about matrix multiply, I would think that using a static schedule with the default chunk size would be about the best, since the work is about the same for each iteration (though it wouldn't be the first time I was wrong). What would have a greater impact, depending on the size of the array in question, is array blocking. This is not part of OpenMP, but is used to take advantage of caching and has been discussed in other posts in this forum.

As a side note, the one time I have really seen specifying chunk size help was in code like this:

Code: Select all

#pragma omp parallel for ordered schedule(static)
for (i = 0; i < N; i++)
  ... some large amount of work
  #pragma omp ordered
     ... some small amount of work
Each chunk had several iterations and the "next thread" had to wait for a previous thread to finish with it's iterations before it could really do anything (other than the first large amount of work). By changing the chunk size to 1 in this case, the user saw a very large speedup.

Re: Schedule and Chunk Size

Posted: Wed Mar 12, 2008 8:49 am
by howa
Hey ejd,

Thank you very much for your reply,

very useful and in detail.