scheduling with arbitrary non-contiguous partitioning

General OpenMP discussion

scheduling with arbitrary non-contiguous partitioning

Postby ykwon » Tue Oct 23, 2018 9:57 pm

Hello,

I am wondering how I partition loop iterations in an arbitrary and non-contiguous order in parallel for. It would be great if there is any prototype that support such a scheduling. As far as I know, there isn't any reference implementation for it except the user defined schedule [1] which is recently proposed and being under discussion.

More specifically, for my research, I'd like to evaluate different scheduling (more of partitioning than assignment) policies based on the information given by an application or profiled performance from hardware. The information could be simply expressed in a bit vector, each position of bit representing each iteration and each position in the vector specifying a thread number to run the iteration. For example, [1] [2] [2] [1] says the first and last iterations form thread1 while the rest forms thread2.

I think the only way to evaluate this is to modify runtime system and thus I am now reviewing the runtime system implementation of both LLVM/Intel and GCC. However, it seems a bit challenging and I feel like I am somewhat overwhelmed.

For those who happen to think about this stuff, any advice for me to begin with would be greatly appreciated.

[1] https://sites.google.com/site/userdefschedopenmp/
ykwon
 
Posts: 3
Joined: Tue Oct 23, 2018 8:06 pm

Re: scheduling with arbitrary non-contiguous partitioning

Postby JimCownie » Wed Oct 24, 2018 3:29 am

I think the only way to evaluate this is to modify runtime system and thus I am now reviewing the runtime system implementation of both LLVM/Intel and GCC.

Indeed, as you have already observed the OpenMP standard does not support this, so you will need to implement something yourself.
That will clearly be non-standard!

For detailed discussion of how to do this in the LLVM/Intel runtime you can use the OpenMP development list at LLVM (http://lists.llvm.org/mailman/listinfo/openmp-dev).
In short form, what I would do to do this is
  • Use the existing schedule(dynamic) syntax on the loop.
  • Add your own, new, runtime library call to pass in the set of iterations associated with each thread. I would make this be a call that is required in each thread where it passes in its set of iteration. This call will then save that information in thread-state. This function should NOT be called omp_ anything (since it's not standard) but have some suitable name prefix of your choice.
  • Modify the code in __kmpc_dispatch_init_* and __kmpc_dispatch_next_* to notice that the new state has been saved, and then return those iterations in dispatch_next.

You can then add these new calls in your test codes above a dynamically scheduled loop and have things all work.

This avoids the need for any compiler changes.
JimCownie
 
Posts: 8
Joined: Thu Jun 06, 2013 6:06 am

Re: scheduling with arbitrary non-contiguous partitioning

Postby ykwon » Wed Oct 24, 2018 2:54 pm

Thank you very much for your confirmation and guidelines to begin with. It would save my time by a lot.

It seems you suggest to implement these on top of LLVM implementation. I am now wondering what are differences between GCC implementation and LLVM implementation for OpenMP runtime system.

As I mentioned earlier, I am digging into both implementation compatible to OpenMP version 4.0 and now seeing that they are much different from each other in terms of implementation specifics. Are there any major differences that I should know before choosing a baseline runtime or am I just free to pick whatever works better for me for this specific evaluation?

I could have asked OpenMP development list at LLVM but first I'd like to understand differences across different runtime systems by different vendors.

Thank you for quick response again.
ykwon
 
Posts: 3
Joined: Tue Oct 23, 2018 8:06 pm

Re: scheduling with arbitrary non-contiguous partitioning

Postby JimCownie » Thu Oct 25, 2018 1:25 am

As I mentioned earlier, I am digging into both implementation compatible to OpenMP version 4.0 and now seeing that they are much different from each other in terms of implementation specifics. Are there any major differences that I should know before choosing a baseline runtime or am I just free to pick whatever works better for me for this specific evaluation?

The two implementations are unrelated. I have not read the GCC implementation, so cannot comment on its internals.

However, there is one critical point to be made nonetheless, which is that the interface to the runtime used by the GCC compilers is different from that used by the LLVM/Intel runtime, and that is the only interface provided by the GCC runtime. Therefore if you modify the GCC runtime you can only run experiments which you have compiled with one of the GCC compilers.

On the other hand, although the LLVM/Intel runtime uses a native interface which is different from that used by GCC, it also provides shim routines so that the GCC interface is available. It is therefore possible to run code compiled by any of the GCC compilers, as well as Intel compilers, and LLVM compilers with the LLVM/Intel runtime.

Therefore if you want the ability to test your runtime with code compiled by any of these compilers (rather than only GCC) making your changes in (a branch of) the LLVM/Intel runtime seems a better choice.
JimCownie
 
Posts: 8
Joined: Thu Jun 06, 2013 6:06 am

Re: scheduling with arbitrary non-contiguous partitioning

Postby MarkB » Thu Oct 25, 2018 2:04 am

Do you really need to implement this in a compiler to get the results you need, or would it be OK just to implement the scheduling policy is user code and explicitly outline the loop body yourself?
MarkB
 
Posts: 779
Joined: Thu Jan 08, 2009 10:12 am
Location: EPCC, University of Edinburgh

Re: scheduling with arbitrary non-contiguous partitioning

Postby ykwon » Fri Oct 26, 2018 12:12 pm

On the other hand, although the LLVM/Intel runtime uses a native interface which is different from that used by GCC, it also provides shim routines so that the GCC interface is available. It is therefore possible to run code compiled by any of the GCC compilers, as well as Intel compilers, and LLVM compilers with the LLVM/Intel runtime.


This sounds interesting. I haven't thought about compilation since I didn't plan to change compilation process at all as long as I can avoid it. will keep this in mind while I am looking into both versions.

Do you really need to implement this in a compiler to get the results you need, or would it be OK just to implement the scheduling policy is user code and explicitly outline the loop body yourself?


If you ask me, I don't intend to implement this in a compiler as I described above. Instead I am working on a runtime. I could explicitly manage my loop body if I am interested in only a few simple applications. However I'd like to evaluate pretty many and complicated ones with some variants of scheduling algorithms, making hard to instrument user code.

Thanks again,
ykwon
 
Posts: 3
Joined: Tue Oct 23, 2018 8:06 pm


Return to Using OpenMP

Who is online

Users browsing this forum: Google [Bot] and 5 guests