Fix OMP num specify issue
In current code, no matter what number of threads specified, all available CPU count is used when invoking OMP, which leads to very bad performance if the workload is small while all available CPUs are big. Lots of time are wasted on inter-thread sync. Fix this issue by really using the number specified by the variable 'num' from calling API. Signed-off-by: Chen, Guobing <guobing.chen@intel.com>
This commit is contained in:
@@ -335,7 +335,7 @@ int exec_blas(BLASLONG num, blas_queue_t *queue){
|
||||
break;
|
||||
}
|
||||
|
||||
#pragma omp parallel for schedule(OMP_SCHED)
|
||||
#pragma omp parallel for num_threads(num) schedule(OMP_SCHED)
|
||||
for (i = 0; i < num; i ++) {
|
||||
|
||||
#ifndef USE_SIMPLE_THREADED_LEVEL3
|
||||
|
||||
Reference in New Issue
Block a user