How to parallelize a recursive function

General OpenMP discussion
Forum rules
The OpenMP Forums are now closed to new posts. Please visit Stack Overflow if you are in need of help: https://stackoverflow.com/questions/tagged/openmp
Locked
PSK
Posts: 5
Joined: Tue Apr 28, 2020 3:17 pm

How to parallelize a recursive function

Post by PSK »

This is a problem regarding how to solve a recursive function using "tasks" instruction. I'll appreciate it, if you can help me solve the problem giving your ideas. The issue is:

For the piece of code coming below, a recursive function named func() is called somewhere inside the main(). As you can see, there are 2 variables inside the recursive function (var1 and var2) that control the number of successive recursive calls. Normally the value of var1 is bigger and var2 will hardly get 0 value from boo(). So, func() will be called successively many times to finally get stopped and return a True/false value, thus we need to parallelize it using OpenMp to reduce its running time.

What I'm trying to apply is "#pragma omp task" instruction, since I know it's the best way OpenMp suggests to parallelize the recursive functions. But this example seems somehow different from routine recursive functions like fibonacci that recursive function fib() was called twice inside itself ( i.e. x = fib(n-1) and y = fib(n-2) inside fib(n) ), and we could simply define two tasks and assign each recursive call to one of them. But in the example below it seems confusing for me how to define the tasks. Can i define N tasks inside the loop before calling a new func()? How? Of course, it's highly welcome if there is another method (rather than using "tasks") to efficiently parallelize this piece of code:

Code: Select all

bool func()
{
     if(var1 == 0) return True;
     else
     {
          // Do few things, then:
          for(int i=0; i<N; i++)
          {
               // Do few things, then:
               boo();    // var2 can changes from 0 to 1 and vise versa here after running boo().
               var1--;   // var1 has a global positive value at the beginning that is reduced in every iteration of each calling to finally get 0 for stopping recursive calls
               if(!var2)
                    if(func())
                         return True;
               var1++;
               // Do few more things
          }
          // Do few things, then:
          return False;
     }
}

main()
{
     // some initialization
     // few things to do, then:
     func();
     // few thing to show the results
}

MarkB
Posts: 808
Joined: Thu Jan 08, 2009 10:12 am
Location: EPCC, University of Edinburgh

Re: How to parallelize a recursive function

Post by MarkB »

Given the use of global variables to control recursion and the possibility of returning from inside the for loop, the code does not look parallelisable in it's current state. Are you convinced that the underlying algorithm is parallelisable?

PSK
Posts: 5
Joined: Tue Apr 28, 2020 3:17 pm

Re: How to parallelize a recursive function

Post by PSK »

MarkB wrote:Given the use of global variables to control recursion and the possibility of returning from inside the for loop, the code does not look parallelisable in it's current state. Are you convinced that the underlying algorithm is parallelisable?
Thanks Dear @MarkB, yes I got the points you noted. I think I should also add few more explanation about intention of the code. There is a shared 2D matrix that our algorithm is going to fill its elements with integer values. The function boo() applies some changes on the values of some elements in the matrix and at the end checks if all elements have been filled or not. If been filled, it makes var2 as 0, otherwise it leaves var2 with value 1 (Thus at the most time of running var2 is 1 and it gets 0 only when matrix is filled totally). But it’s a kind of puzzle game and our algorithm can fill the matrix wrongly. In this case it should back to previous state and return the variable and elements to their previous values and then set another value to that element (As you see in the code, making var1++ and "//do other few things" later are for this purpose).

What I’m thinking to do is to run three successive lines of boo(), var1-- and if(!var2) inside a "#pragma omp critical" block not to allow the other threads to enter and change the matrix elements values while one thread is still working on it. I don’t know how efficient this will speed-up the parallelization, but there is a more important thing I encounter. There are 2 return statements in "#pragma omp parallel" block that arouse error while compiling. I don't know how I can re-arrange my code in a way that two “return” statements get out of the parallel block. The parallelized form of code seems something as the code below:

Code: Select all

// First problem is how to apply a critical scope at the presence of such an if() statement below.
// Second, how to pull 2 return statements inside the #pragma omp parallel out of the parallel zone.


bool func()
{
    if(var1 == 0) return True;
    else
    {
        // Do few things
        #pragma omp parallel
        {
            #pragma omp for
                for(int i=0; i<N; i++)
                {
                    // Do few things
                    #pragma omp critical
                    {
                        boo();
                        var1--;
                        if(!var2)
                    }   // the critical scope should end here in my opinion, but does it seem logical/feasible right after if() condition without including if() body code ?!  
                            if(func())
                                    return True;
                        var1++;
                    // Do few more things
                }
                // Do few things, then:
                return False;
        }

    }
}

main()
{
     func();
}

MarkB
Posts: 808
Joined: Thu Jan 08, 2009 10:12 am
Location: EPCC, University of Edinburgh

Re: How to parallelize a recursive function

Post by MarkB »

PSK wrote:I don't know how I can re-arrange my code in a way that two “return” statements get out of the parallel block.
Depends a bit on what you want to happen - if one thread reaches "return True", do you want the other threads to keep executing their parallel loop iterations, or to exit the parallel region as soon as possible?

For the first case, you can set a flag and check after the loop if any thread reached the condition (you can make the flag a reduction variable). For the second case, you can use the cancel (and cancellation point) directives to terminate the parallel region early.

Recursively creating parallel regions can easily result in creating way too many threads - you might want to use the OMP_THREAD_LIMIT environment variable to prevent this.

PSK
Posts: 5
Joined: Tue Apr 28, 2020 3:17 pm

Re: How to parallelize a recursive function

Post by PSK »

MarkB wrote:
PSK wrote:I don't know how I can re-arrange my code in a way that two “return” statements get out of the parallel block.
if one thread reaches "return True", do you want the other threads to keep executing their parallel loop iterations, or to exit the parallel region as soon as possible?
The first thread reaches "return true" only when all the elements in the matrix have been filled by values and there is no empty element to be valued. So, there is no need for other threads to continue their execution (Puzzle has been filled out truly and task's been done). Therefore I'm going to try cancel (cancellation point) directives to terminate the parallel regions, and check the speed-up of code. But I still don't know if I'll be able to get rid of "returns" statement by using cancellation point or not (since i haven't experienced using cancel directives yet). Shall I follow the idea of using flags again here (along with cancellation directive)?

MarkB
Posts: 808
Joined: Thu Jan 08, 2009 10:12 am
Location: EPCC, University of Edinburgh

Re: How to parallelize a recursive function

Post by MarkB »

PSK wrote:Shall I follow the idea of using flags again here (along with cancellation directive)?
Yes, you will still need to use flags - it's not permitted to jump out of a parallel region.

PSK
Posts: 5
Joined: Tue Apr 28, 2020 3:17 pm

Re: How to parallelize a recursive function

Post by PSK »

MarkB wrote:
PSK wrote:Shall I follow the idea of using flags again here (along with cancellation directive)?
Yes, you will still need to use flags - it's not permitted to jump out of a parallel region.
Trying to replace return statements with flags, I accomplished in the case of “return true”, but got stuck in “return false”. To make what exactly is the problem, first I want to simplify the code even more as below:

Code: Select all

bool func()
{
	if(...) return true;
	else
	{
		// ...
		// few things to do
		// ...
		#pragma omp paralle
		{
			// ...
			// ...
			// I don't need to parallelize the loop below
			// so, I don't use #pragma omp for
			for(int i=0; i < values.size(); i++)  // each thread has its own local values list
			{
				// ...
				// ...
				if(!var2)
					if(func())
						return true;
				job1();
				job2();
			}
			job3();
			return false;
		
		} // end point of parallel zone
	}
}
It’s the way I need my code to be parallelized (for example, I don’t need to have a “omp parallel for” and sharing the work load among threads happens somewhere above the “for” loop).

Now for getting rid of return statements inside the parallel scope, I use a flag named succ_filled as below. It work correctly for “return true” one, since the if(func()) condition will be true only when our job is finished well and all grids in the matrix are filled successfully (it’s seen only one time during the running). Therefore, when one of threads reaches the “return true” statement, change the flag to break its execution and goes to the implicit barrier at the end point of parallel zone waiting for other threads to stop their execution and join. The execution of other threads break in the middle (caused by the flag) and when all threads joins together, the “true” statement is returned in the first lines after the parallel zone:

Code: Select all

bool func()
{
	if(...) return true;
	else
	{
		// ...
		// few things to do
		// ...
		bool succ_filled = false;
		#pragma omp paralle
		{
			// ...
			// ...
			// I don't need to parallelize the loop below
			// so, I don't use #pragma omp for
			for(int i=0; i < values.size() && !succ_filled; i++)  // each thread has its own local values list
			{
				// ...
				// ...
				if(!var2)
					if(func())
						succ_filled = true;
				
				if(!succ_filled)
				{
					job1();
					job2();
				}
			}
			if(!succ_filled)
			{
				job3();
			}
			
		} // end point of parallel zone
		
		if(succ_filled)
			return true;
		else return false;
	}
}
But the problem comes for “return false” statement. In a situation that (for a thread) the if(!var2) doesn’t get “true” for any index in the range (i), and loop ends without visiting any recursive calling of func(), the program after doing job3 should return “false” to the previous step (since the problem should be solved in the previous callings). In the code developed above using a flag, thread does job3 correctly and wait in the implicit barrier for other threads to be done and join it. But other threads’ job is directly depended on this thread. They will never touch “return true” statement (can’t complete assigning values to all grids of matrix) if that thread doesn’t return and revise its value. In this way, the code gets stuck in a dead-end.

The only solution came to my mind was to use “nowait” directive on parallel zone to make the thread not to wait for others to join to return the false value. But I get error since it seems impossible to use “nowait” on “#pragma omp parallel”. I was then wondering how else I can use a flag to make a thread reach outside the parallel scope and return a value not waiting for other threads to join. There may be other directives to be used in such a situation that I don’t know about. Appreciate your kind idea.

MarkB
Posts: 808
Joined: Thu Jan 08, 2009 10:12 am
Location: EPCC, University of Edinburgh

Re: How to parallelize a recursive function

Post by MarkB »

PSK wrote:But other threads’ job is directly depended on this thread. They will never touch “return true” statement (can’t complete assigning values to all grids of matrix) if that thread doesn’t return and revise its value. In this way, the code gets stuck in a dead-end.
Sorry, I don't really understand - "revise its value" of what?
Once the parallel region has ended, only the thread that called this instance of func will execute the return statement.

PSK
Posts: 5
Joined: Tue Apr 28, 2020 3:17 pm

Re: How to parallelize a recursive function

Post by PSK »

MarkB wrote:
PSK wrote:But other threads’ job is directly depended on this thread. They will never touch “return true” statement (can’t complete assigning values to all grids of matrix) if that thread doesn’t return and revise its value. In this way, the code gets stuck in a dead-end.
Sorry, I don't really understand - "revise its value" of what?
Once the parallel region has ended, only the thread that called this instance of func will execute the return statement.
The job is assigning a correct integer value for each element (grid) of a 2D vector (a game puzzle). The vector's grids have been divided among threads. By following a math comes from the rule of game, only a short list of possible integer values are left for each grid to take (e.g. A grid may be able to take one of these values: 1,2,3 but the other grid can take one of these values: 2,3,5,7,8,9). The thread randomly select one of those possible values for the first grid (e.g "2"), and go ahead to value the second grid. In the 2nd grid however, it may finds out that (due to game rule) it can't select any of possible values (2,3,5,7,8,9) for this grid. So, this shows that the algorithm has chosen a wrong value for one of previous grids (for first grid in our example). So it has to go back to previous calling (by returning false value) and select another value from remaining values in the list of 1,2,3 (e.g this time it selects "3" for first grid) and then go to the second grid again to control the situation. This time, If it can select one of (2,3,5,7,8,9) for the 2nd grid, it goes to 3rd grid. This process continues by each thread till all assigned grids are valued. When the puzzle is complete, one of threads will first understand it (when if(...) in first line of code returns true) and jumps to the end of parallel zone (correctly making the other threads do the same).

The problem occurs when one of threads wants to return "false" (to revise the value of previous grids). It jumps to the end of parallel zone (using flag) waiting for other threads to join it (since there is no nowait director). But everything may goes well for other threads and they don't want return any "false" value, so they won't jump the end of parallel zone to join that thread. In addition, the other threads won't see "return true" statement either. Why? Since, the "return true" will be only seen when all the grids of puzzle are given a valid value, but the thread that has been stuck in "false return" won't allow this. (i.e. At least one not-initiated grid is under the control of that thread).

Locked