Fix OMP num specify issue

In current code, no matter what number of threads specified, all
available CPU count is used when invoking OMP, which leads to very bad
performance if the workload is small while all available CPUs are big.
Lots of time are wasted on inter-thread sync. Fix this issue by really
using the number specified by the variable 'num' from calling API.

Signed-off-by: Chen, Guobing <guobing.chen@intel.com>
This commit is contained in:
Chen, Guobing 2020-08-12 03:28:25 +08:00
parent a073fa870e
commit 0c1c903f1e
1 changed files with 1 additions and 1 deletions

View File

@ -335,7 +335,7 @@ int exec_blas(BLASLONG num, blas_queue_t *queue){
break;
}
#pragma omp parallel for schedule(OMP_SCHED)
#pragma omp parallel for num_threads(num) schedule(OMP_SCHED)
for (i = 0; i < num; i ++) {
#ifndef USE_SIMPLE_THREADED_LEVEL3