CPU-GPU Asynchronous execution

General OpenMP discussion
Forum rules
The OpenMP Forums are now closed to new posts. Please visit Stack Overflow if you are in need of help: https://stackoverflow.com/questions/tagged/openmp
Locked
maelso
Posts: 1
Joined: Tue May 12, 2020 8:17 am

CPU-GPU Asynchronous execution

Post by maelso »

Hi everyone,

The OpenMP 5.0 specification states that There is an implicit barrier at the end of the parallel construct.

Is there any way to remove this barrier?

I am running a trivial example of adding 1 to each position of the array in each time step.
I am investigating how to distribute the work between CPU and GPU asynchronously with OpenMP 5.0.

In the example below, the CPU idly waits for the first parallel for (GPU) to finish before starting its execution.

Code: Select all

    #pragma omp target enter data map(to: u [0:size_u[0] * (size_u[1] / 2)])
    for (int time = time_m; time < time_M; time += 1)
    {
        #pragma omp target teams distribute parallel for collapse(2) nowait
        for (int x = x_m; x < x_M/2; x += 1)
        {
            for (int y = y_m; y < y_M; y += 1)
            {
                u[y + x * GRID_SIZE] = u[y + x * GRID_SIZE] + 1;
            }
        }

        // CPU working
        #pragma omp parallel for collapse(2)
        for (int x = x_M / 2; x < x_M; x += 1)
        {
            for (int y = y_m; y < y_M; y += 1)
            {
                u[y + x * GRID_SIZE] = u[y + x * GRID_SIZE] + 1;
            }
        }
    }
I'm using:
CUDA release 10.1
clang version 10.0.0 (tags/RELEASE_900/final)

Can anyone help me, please?

MarkB
Posts: 808
Joined: Thu Jan 08, 2009 10:12 am
Location: EPCC, University of Edinburgh

Re: CPU-GPU Asynchronous execution

Post by MarkB »

You can use the nowait clause on the target construct to make the target region execute asynchronously, and then (if required) wait for completion in the same way as you would for a task region (e.g. with taskwait).

Locked