arrays named in the threadprivate directive are prohibited from persisting
across distinct parallel regions under any circumstances.
I am currently constructing a parallel library for Domain Decomposition for
scientific FORTRAN applications. The library is currently in MPI and I would
like to port it to OpenMP. However, it uses local/private allocatable arrays.
The library performs domain decomposition using locally allocated arrays for
each processor (thread). Users can then compute on these arrays using
call-backs through the "call Do_All(external function name)" library routine.
The Do_All routine has a single OpenMP parallel region.
The Do_All works fine if the allocated array is a single large global shared array.
However computations are faster if we split up the array across processors,
resulting in smaller, private arrays for each processor (thread). These local
arrays are contained in a module and are declared both with the "threadprivate"
directive and the Fortran "save" declaration.
These local arrays are allocated in a parallel region within an initialization
routine, callable by the user. When the application wants to perform
computations on these arrays using the Do_All routine, these arrays arrive at
the parallel region undefined!?, as the OpenMP specification mandates.
Why is this the case? Can this be changed so that allocatable arrays can
persist just like other local variables? See Section 2.9.2, p87.
We don't really see any wording on p.87 or 88 that would indicate that would prohibit this under all circumstances.
On p. 88 there are notes on only how an allocatable threadprivate array is initially defined or undefined in the first parallel region.
The words "that appears on the first parallel region in which it is referenced" means if the code is correct the user can ssume the objects persist until they break the necessary conditions for persistence.
On page 84 in says:
The values of data in the threadprivate variables of non-initial threads are guaranteed to persist between two consecutive active parallel regions only if all the following conditions hold:
- Neither parallel region is nested inside another explicit parallel region.
- The number of threads used to execute both parallel regions is the same.
- The value of the dyn-var internal control variable in the enclosing task region is false at entry to both parallel regions.
Would that apply to your code?