Page 1 of 2

How to parallelize this fortran code by openmp

PostPosted: Fri Jan 18, 2008 11:02 pm
by gonski
Dear all,

I have a fortran code to calculate particle motion. there are alway many big loops. Some guy suggest me to use openmp to speed it up. the following is the pseudo code.

do i=1,ni
a(i)=c1
c2=0
ks=lfirst(i)
ke=last(i)
do k=ks,ke
c2=c2+f(k)
c3=f1+f2
................
c4=f+f3
enddo
f(i)=c2+c3
enddo

It can be seen that the loop i in the code is independent, but loop k is dependent. All calculations mainly happen in loop k which is small, up to 8. The loop i is always big, up to 200,000. I only want to use openmp for loop i. Your hints would be greatly appreciated. Cheers,Gonski

Re: How to parallelize this fortran code by openmp

PostPosted: Sun Jan 20, 2008 7:30 am
by ejd
I will start with a simple question about "f". Is it a scalar or an array? You use it in two lines:

    c2 = c2 + f(k)
    c4 = f + f3
and I am not sure if it is a mistake or if c4 is an array.

Re: How to parallelize this fortran code by openmp

PostPosted: Sun Jan 20, 2008 10:11 am
by lfm
Code: Select all
!$omp parallel do
do i = 1,ni
...
enddo


Is that all you want?

Re: How to parallelize this fortran code by openmp

PostPosted: Sun Jan 20, 2008 12:03 pm
by ejd
I think he wants more than that. He wants to deal with all the dependencies. For example, variables ke and ks need to be private so he is going to have to add a private clause.

Code: Select all
!$omp parallel do private(ke,ks)
do i = 1, ni
...
enddo

Then there are the others that need to be handled - c2, c3, and c4. That is why I started with a simple question about "f", so I could understand more about "f" and "c4".

Re: How to parallelize this fortran code by openmp

PostPosted: Sun Jan 20, 2008 2:37 pm
by gonski
Thank you very much.
In these two days, I read some documents on openmp. Based on my studies, I have changed my code as the style of the following code. Now all the calculations in a big loop are put into a subroutine like "pr" below. I test it on linux. It works. Interestingly, the following code get the same results with sequential and parallel runnings. Do you think it is ok or not? Another thing is that in my code there is some situation like k as follows . I hope k will sum some values in different threads when it meets a preset-condition . Therefore here k is treated as DEFAULT(SHARED). This way is correct, right?
Cheers,
Gonski


Program test

common/nn/k
open(1,file='test.txt')
n=10
k=1

c.....OpenMP : Start parallel loop section
c$OMP PARALLEL DO DEFAULT(SHARED), PRIVATE(i,
c$OMP* ),
c$OMP* SCHEDULE(RUNTIME)

do i=1,n
call pr(i)
enddo

close(1)

end

subroutine pr(i)
common/nn/k
write(1,*)''
write(1,*)'1,i=,k=',i,k
if(mod(k,2)==0)k=k+1
write(1,*)'2,i=,k=',i,k

return
end

Re: How to parallelize this fortran code by openmp

PostPosted: Sun Jan 20, 2008 4:36 pm
by ejd
The code is not doing anything interesting. You will note that k is never being incremented. If it were, you would have a race condition where multiple threads could write to k at the same time. You could do something like:
Code: Select all
Program test
open(1,file='test.txt')
n=10
k=1

!.....OpenMP : Start parallel loop section
!$OMP PARALLEL DO DEFAULT(SHARED), PRIVATE(i),
!$OMP* SCHEDULE(runtime) reduction(+:k)
do i=1,n
call pr(i, k)
enddo

close(1)

end

subroutine pr(i, k)
write(1,*)''
write(1,*)'1,i=,k=',i,k
if(mod(i,2)==0)k=k+1
write(1,*)'2,i=,k=',i,k

return
end

Re: How to parallelize this fortran code by openmp

PostPosted: Sun Jan 20, 2008 4:57 pm
by gonski
Great. It works. Thank you very much, ejd !
Now I have such a situation, Could you help tell which is correct?

Method 1

Program test
real a(100), b(100)
n=100
!.....OpenMP : Start parallel loop section
!$OMP PARALLEL DO DEFAULT(SHARED), PRIVATE(i,a,b),
!$OMP* SCHEDULE(runtime)
do i=1,n
a(i)=i+1
b(i)=i
enddo

end

-----------------------------------------------------------------------------------
Method 2


Program test
real a(100), b(100)
common/eva/a,b
n=100
!.....OpenMP : Start parallel loop section
!$OMP PARALLEL DO DEFAULT(SHARED), PRIVATE(i),
!$OMP* SCHEDULE(runtime)
do i=1,n
call evaluate(i)
enddo

end

subroutine evaluate(i)
real a(100), b(100)
common/eva/a,b
a(i)=i+1
b(i)=i
return
end

Re: How to parallelize this fortran code by openmp

PostPosted: Mon Jan 21, 2008 7:49 am
by ejd
Since these are only partial programs, it is hard to say what is "correct". What I can say is that they do very different things.

In Method 1, you have privatized the arrays "a" and "b". That means that each thread gets it's own copy and that only the portion of the array that thread is working on will be changed. The private copies will not persist past the end of the parallel region.

In Method 2, the arrays "a" and "b" are shared, so there is only one copy of them shared by all of the threads and the values will persist after the parallel region.

These two Methods would do the same thing if you didn't put "a" and "b" in the private clause in Method 1. Hopefully that answers your question.

Re: How to parallelize this fortran code by openmp

PostPosted: Tue Jan 22, 2008 2:58 am
by gonski
This would be greatly helpful. Thx

Re: How to parallelize this fortran code by openmp

PostPosted: Thu Jan 24, 2008 4:00 am
by gonski
I was told that any scalar in a parallelized loop should be in the private declaration. This is really puzzling me after I did some experiments on windows.

I test the flowing code on a windows system. the results are
4 8 12
2 10 6

This fact is expected. It is found that no matter c is in the private declaration or not, I get the same results. another thing is that in this thread, edj has kindly reminded me of that the treatments of icnt and jcnt in the following code may make multiple threads compete to write the same variables. However, when I try to use reduction clause, I can not get above results.

Now my questions are

1) to privatize scalar in a parallelized loop is necessary or not? why?
2) do above issues depend on operation system? I guess in linux, I need consider edj's suggetion.
3) icnt and jcnt will be a counter in a code. It seems REDUCTION clause is not applicable to them (at least under windows). Is there any other clause available to handle this situations?

Regards,
Shibo




program test
use omp_lib
integer num_threads

real a(100), b(100)

num_threads=3
call omp_set_num_threads(num_threads)

n=6
icnt=0
jcnt=0

!.....OpenMP : Start parallel loop section
!$OMP PARALLEL DO DEFAULT(SHARED), PRIVATE(i,c),SCHEDULE(runtime)
do i=1,n
c=i*2
if(mod(i,2)==0)then
icnt=icnt+1
a(icnt)=c
endif

if(mod(i,2)==1)then
jcnt=jcnt+1
b(jcnt)=c
endif

enddo

print*,a(1:icnt)
print*,b(1:jcnt)


end program test