As Gabriele said, we need more information to help.
There are several vendors that supply an optimized BLAS library. In many cases these have already been parallelized for you. Especially if it is BLAS-2 or BLAS-3. Most likely, OpenMP is used for that so you'll only need to set OMP_NUM_THREADS to use multiple threads.
You could still call such routines from within a parallel region as well, but that implies nested parallelism. You need to check whether this is supported and may have to set OMP_NESTED to TRUE to enable it.
In case you want to parallelize BLAS routines yourself, you may want to leverage what others have done. Some are trivial, but other routines can be pretty complicated.
Kind regards, Ruud