OMP parallel for causes segmentation fault.

General OpenMP discussion
Post Reply
MutantTurkey
Posts: 4
Joined: Mon Apr 29, 2013 1:00 pm

OMP parallel for causes segmentation fault.

Post by MutantTurkey »

I'm having trouble using the #pragma omp parallel for

Basically I have several hundred DNA sequences that I want to run against an algorithm called NNLS.

I figured that doing it in parallel would give me a pretty good speed up, so I applied the #pragma operators.

When I run it sequentially there is no issue, the results are fine, but when I run it with #pragma omp parallel for I get a segfault within the algorithm (sometimes at different points).

The segfault is within the "nnls" function.

Code: Select all

#pragma omp parallel for
for(int i = 0; i < dir_count; i++ ) {

  int z = 0;
  int w = 0;
  struct dirent *directory_entry;
  char filename[256];

  directory_entry = readdir(input_directory_dh);

  if(strcmp(directory_entry->d_name, "..") == 0 || strcmp(directory_entry->d_name, ".") == 0) {
    continue;
  }

  sprintf(filename, "%s/%s", input_fasta_directory, directory_entry->d_name);

  double *count_matrix = load_count_matrix(filename, width, kmer);

  //normalize_matrix(count_matrix, 1, width)
  for(z = 0; z < width; z++) 
    count_matrix[z] = count_matrix[z] * lambda;

  // output our matricies if we are in debug mode
  printf("running NNLS on %s, %d, %d\n", filename, i, z);
  double *trained_matrix_copy = malloc(sizeof(double) * sequences * width);
  for(w = 0; w < sequences; w++) {
    for(z = 0; z < width; z++) {
      trained_matrix_copy[w*width + z] = trained_matrix[w*width + z];
    }
  } 

  double *solution = nnls(trained_matrix_copy, count_matrix, sequences, width, i);


  normalize_matrix(solution, 1, sequences);
  for(z = 0; z < sequences; z++ )  {
    solutions(i, z) = solution[z]; 
  }

  printf("finished NNLS on %s\n", filename);

  free(solution);
  free(trained_matrix_copy);
}
gdb always exits at a different point in my thread, so I can't figure out what is going wrong.

What I have tried:

-allocating a copy of each matrix, so that they would not be writing on top of eachother
-using a mixture of private/shared operators for the #pragma piece
-using different input sequences
-writing out my trained_matrix and count_matrix prior to calling NNLS, ensuring that they look OK. (they do!)

I'm sort of out of ideas. Does anyone have some advice?

MarkB
Posts: 803
Joined: Thu Jan 08, 2009 10:12 am
Location: EPCC, University of Edinburgh

Re: OMP parallel for causes segmentation fault.

Post by MarkB »

Hi there,

A couple of possibilities:

1) The function nnls isn't thread-safe, and some race condition is occurring inside it. This can happen if, for example, there are variables inside the function which are declared as static, or declared at file scope, which will be shared between threads inside the parallel region. Do you have access to the source code for nnls, or is it from a library?

2) The worker threads are running out of stack space inside nnls. You can use the OMP_STACKSIZE environment variable to try increasing the stack size for worker threads.

One useful test would be to enclose the call to nnls in a #pragma omp critical construct and see if it works (which suggests a race condition) or not (which suggests the stack space issue).

Hope that helps,
Mark.

MutantTurkey
Posts: 4
Joined: Mon Apr 29, 2013 1:00 pm

Re: OMP parallel for causes segmentation fault.

Post by MutantTurkey »

I figured this out last night, turns out that a few erroneous static declarations of variables inside the NNLS function were causing the mishap!

Note to self, never use any code that was translated from fortran automatically ;)

MarkB
Posts: 803
Joined: Thu Jan 08, 2009 10:12 am
Location: EPCC, University of Edinburgh

Re: OMP parallel for causes segmentation fault.

Post by MarkB »

Great, glad it's now working!

MutantTurkey
Posts: 4
Joined: Mon Apr 29, 2013 1:00 pm

Re: OMP parallel for causes segmentation fault.

Post by MutantTurkey »

Mark, I'm having issues still,

Removing all the static declarations solved the nnls problems as far as I can see, but now I get errors at this section:

This is part of the same loop, where we want to add our solution from each NNLS run to an array containing all the solutions:

Code: Select all

#pragma omp critical
for(z = 0; z < sequences; z++ )  {
    solutions[i*sequences + z] = solution[z]; 
}             
gdb isn't being very helpful either:

Code: Select all

Starting program: /home/calvin/quikr-c/multifasta_to_otu -i input/ -f /home/calvin/quikr/gg94_training_input.fasta -t gg94_trained.txt -o output -k 6 -l 10000 -j 1
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7fffda557700 (LWP 27774)]
[New Thread 0x7fffd9d56700 (LWP 27775)]
[New Thread 0x7fffd9555700 (LWP 27776)]
executed "count-kmers -r 6 -1 -u input//sample=700015250.fa"
executed "count-kmers -r 6 -1 -u input//sample=700015268.fa"
executed "count-kmers -r 6 -1 -u input//sample=700015289.fa"
executed "count-kmers -r 6 -1 -u input//sample=700015009.fa"
running NNLS on input//sample=700015289.fa, 14, 4097
running NNLS on input//sample=700015268.fa, 21, 4097
running NNLS on input//sample=700015250.fa, 7, 4097
running NNLS on input//sample=700015009.fa, 0, 4097

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffd9d56700 (LWP 27775)]
0x00000000004023c9 in main._omp_fn.0 () at multifasta_to_otu.c:169
169	      solutions[i*sequences + z] = solution[z]; 
(gdb) print solutions
No symbol "solutions" in current context.
(gdb) print solution
No symbol "solution" in current context.
(gdb) print z
No symbol "z" in current context.
(gdb) print i
No symbol "i" in current context.
(gdb) print sequences
No symbol "sequences" in current context.
(gdb) 
Apparently none of my variables exist to debug! :shock:

MutantTurkey
Posts: 4
Joined: Mon Apr 29, 2013 1:00 pm

Re: OMP parallel for causes segmentation fault.

Post by MutantTurkey »

Well turns out I should malloc something larger than zero!

still having problems with nnls though, here is the source:

https://github.com/mutantturkey/quikr-c ... ter/nnls.c

MarkB
Posts: 803
Joined: Thu Jan 08, 2009 10:12 am
Location: EPCC, University of Edinburgh

Re: OMP parallel for causes segmentation fault.

Post by MarkB »

MutantTurkey wrote:still having problems with nnls though, here is the source:
If you've taken out all the statics, I can't immediately see any other problems, I'm afraid.

ftinetti
Posts: 603
Joined: Wed Feb 10, 2010 2:44 pm
Contact:

Re: OMP parallel for causes segmentation fault.

Post by ftinetti »

Hi,
Well turns out I should malloc something larger than zero!

still having problems with nnls though, here is the source:

https://github.com/mutantturkey/quikr-c ... ter/nnls.c
Maybe that's not the last version? I would like to play around a little bit with the code that uses OpenMP, would yo post it and/or update the links? I've seen

Code: Select all

char *usage = "Usage: quikr-train -i <fasta-file> -f <trained-database-fasta> -t <trained-database> -o <output-file> -k <kmer-size> -l <lambda>"
and maybe the input files are too large...?

Fernando.

jakub
Posts: 74
Joined: Fri Oct 26, 2007 3:19 am

Re: OMP parallel for causes segmentation fault.

Post by jakub »

Note that e.g. readdir isn't reentrant nor thread-safe, so you can't use it in a parallel region safely.

Post Reply