OpenBLAS

Commit Graph

Author	SHA1	Message	Date
Martin Kroeker	9ff84dc3f2	remove unused status variable	2023-07-26 10:02:44 +02:00
Martin Kroeker	3326b924b3	remove status variable blas_num_threads_set; initialize openmp thread maximum on startup	2023-07-26 00:31:24 +02:00
Martin Kroeker	579bc86671	remove call to omp_set_num_threads	2023-03-21 20:58:56 +01:00
Martin Kroeker	05aa88268f	add status variable for openblas_set_num_threads	2023-03-08 23:41:57 +01:00
Kai T. Ohlhus	84453b924f	Support CONSISTENT_FPCSR on AARCH64	2022-09-22 00:20:40 +09:00
Martin Kroeker	9402df5604	Fix missing external declaration	2022-09-14 21:44:34 +02:00
Martin Kroeker	80cdfed7b2	Use OMP_ADAPTIVE setting to choose between static and dynamic OMP threadpool size	2022-07-27 23:43:20 +02:00
Alexander Grund	60005eb47b	Don't overwrite blas_thread_buffer if already set After a fork it is possible that blas_thread_buffer has already allocated memory buffers: goto_set_num_threads does allocate those already and it may be called by num_cpu_avail in case the OpenBLAS NUM_THREADS differ from the OMP num threads. This leads to a memory leak which can cause subsequent execution of BLAS kernels to fail. Fixes #2993	2020-11-19 14:51:51 +01:00
Chen, Guobing	a7b1f9b1bb	Implementation of BF16 based gemv 1. Add a new API -- sbgemv to support bfloat16 based gemv 2. Implement a generic kernel for sbgemv 3. Implement an avx512-bf16 based kernel for sbgemv Signed-off-by: Chen, Guobing <guobing.chen@intel.com>	2020-10-29 02:08:23 +08:00
Martin Kroeker	85154c2e18	Change "HALF" and "sh" to "BFLOAT16" and "sb"	2020-10-12 00:05:05 +02:00
Martin Kroeker	ac653c94f3	Merge branch 'develop' into issue2588-cmake	2020-10-11 13:57:07 +02:00
Alexander Grund	3094fc6c83	Lazyly reinit threads after a fork in OMP mode This initializes the per-thread memory buffers which get cleared/released on a fork via pthread_at_fork. Not doing so leads to each thread calling blas_memory_alloc on almost every execution which slows down the code significantly as the threads race for the memory allocation using locks to serialize that.	2020-10-01 15:41:42 +02:00
Martin Kroeker	896bbd55e1	Add support for building only selected variable types	2020-09-26 23:25:55 +02:00
Chen, Guobing	deaeb6c5b8	Add bfloat16 based dot and conversion with single/double 1. Added bfloat16 based dot as new API: shdot 2. Implemented generic kernel and cooperlake-specific (AVX512-BF16) kernel for shdot 3. Added 4 conversion APIs for bfloat16 data type <=> single/double: shstobf16 shdtobf16 sbf16tos dbf16tod shstobf16 -- convert single float array to bfloat16 array shdtobf16 -- convert double float array to bfloat16 array sbf16tos -- convert bfloat16 array to single float array dbf16tod -- convert bfloat16 array to double float array 4. Implemented generic kernels for all 4 conversion APIs, and cooperlake-specific kernel for shstobf16 and shdtobf16 5. Update level1 thread facilitate functions and macros to support multi-threading for these new APIs 6. Fix Cooperlake platform detection/specify issue when under dynamic-arch building 7. Change the typedef of bfloat16 from unsigned short to more strict uint16_t Signed-off-by: Chen, Guobing <guobing.chen@intel.com>	2020-09-04 02:31:25 +08:00
Chen, Guobing	0c1c903f1e	Fix OMP num specify issue In current code, no matter what number of threads specified, all available CPU count is used when invoking OMP, which leads to very bad performance if the workload is small while all available CPUs are big. Lots of time are wasted on inter-thread sync. Fix this issue by really using the number specified by the variable 'num' from calling API. Signed-off-by: Chen, Guobing <guobing.chen@intel.com>	2020-08-24 02:45:54 +08:00
Martin Kroeker	791e046744	Update conditional for atomics to use HAVE_C11	2020-07-18 17:05:59 +00:00
Martin Kroeker	47bf0dba8f	Add build-time option for OMP scheduler; document MULTITHREAD_THRESHOLD range (#1620 ) * Allow choosing the OpenMP scheduler and add range hint for GEMM_MULTITHREAD_THRESHOLD * Amended description of GEMM_MULTITHREAD_THRESHOLD to reflect #742 making it track floating point operations rather than matrix size	2018-06-15 11:25:05 +02:00
zhiyong.dang	53457f222f	move _Atomic define to common.h	2018-05-11 00:13:16 -07:00
Zhiyong Dang	3716267124	Change _STDC_VERSION__ to __STDC_VERSION__ Change-Id: Id3fa4e8d9eedd4ef7230df69b611e7f397301a42	2018-05-11 12:15:08 +08:00
Zhiyong Dang	1b83341d19	Fix race condition in blas_server_omp.c Change-Id: Ic896276cd073d6b41930c7c5a29d66348cd1725d	2018-04-27 17:00:42 +08:00
Timothy Gu	6c2ead30f0	Remove all trailing whitespace except lapack-netlib Signed-off-by: Timothy Gu <timothygu99@gmail.com>	2014-06-27 12:05:18 -07:00
Olivier Grisel	046e4013cb	Revert "Refs #294 . Used pthread_atfork to avoid hang after a Unix fork." This reverts commit `3617c22a56`.	2014-02-19 18:32:54 +01:00
Zhang Xianyi	3617c22a56	Refs #294 . Used pthread_atfork to avoid hang after a Unix fork. The problem is the mutex we used in blas_server. Thus, we must clear the mutex before the fork and re-init them at parent and child process. If you used OpenMP, GOMP has the same problem by now. Please try other OpenMP implemantation.	2014-02-18 15:36:04 +08:00
Zhang Xianyi	2a7503e563	Refs #225 . Fixed a bug in GEMM OpenMP threading.	2013-07-15 09:56:19 +08:00
Zhang Xianyi	d744c9590a	In OpenMP threading, preallocate the thread buffer instead of allocating the buffer every time. This patch improved the performance slightly.	2013-03-01 14:36:47 +08:00
Zhang Xianyi	3cc6ae793e	Refs #174 . Return sb pointer when OpenMP or Windows.	2013-02-26 00:48:21 +08:00
Xianyi Zhang	4727fe8abf	Refs #47 . On Loongson 3A, set DGEMM_R parameter depending on different number of threads. It would improve double precision BLAS3 on multi-threads.	2011-09-05 15:13:52 +00:00
Xianyi Zhang	82f5274828	Refs #39 . It's unnecessary to include sys/mman.h file in blas_server_omp.c.	2011-06-22 01:52:20 +08:00
Xianyi Zhang	989c6f8b06	Fixed #14 the SEGFAULT bug on 64 cores. On SMP server, the number of CPUs or cores should be less than or equal to 64.	2011-04-07 14:48:10 +08:00
Xianyi Zhang	342bbc3871	Import GotoBLAS2 1.13 BSD version codes.	2011-01-24 14:54:24 +00:00

30 Commits