OpenBLAS

Author	SHA1	Message	Date
shivammonaka	9e22d70957	Dynamic locking in Pthread Backend to allow multiple BLAS calls to be executed parallelly	2024-06-07 08:40:17 +05:30
Martin Kroeker	db070a9223	add gemm_batch drivers	2024-05-31 18:29:27 +02:00
Martin Kroeker	d0794f88dc	add gemm_batch driver	2024-05-29 15:49:20 +02:00
yamazaki-mitsufumi	51ab1903e7	Expanding the scop of 2D thread distribution	2024-04-18 18:20:25 +09:00
shivammonaka	d49ebc54e1	Merge branch 'shivam-develop' into shivam-Locks	2024-02-29 11:58:14 +05:30
shivammonaka	bc191015e3	Using OpenMP locks with NUM_PARALLEL	2024-02-29 11:47:05 +05:30
Martin Kroeker	c4bd4a2e5d	fix improper function prototypes (empty parentheses)	2023-09-30 12:49:24 +02:00
Chris Sidebottom	32f2fafde7	Propagate SWITCH_RATIO to DYNAMIC_ARCH builds Previously dynamic builds were either using the default SWITCH_RATIO or one from the higher level architecture; this patch ensures the dynamic builds can use this parameter as well.	2023-04-17 15:34:12 +01:00
Honglin Zhu	4989e039a5	Define SBGEMM_ALIGN_K for DYNAMIC_ARCH build	2022-10-27 14:10:26 +08:00
Honglin Zhu	b00d5b9746	New sbgemm implementation for Neoverse N2 1. Use UZP instructions but not gather load and scatter store instructions to get lower latency. 2. Padding k to a power of 4.	2022-10-26 15:09:41 +08:00
Wangyang Guo	3dc6052c7e	initial support for Sapphire Rapids platform	2021-10-12 01:30:40 -07:00
Martin Kroeker	2f8220d757	Add sbgemm	2021-09-14 16:14:43 +02:00
Martin Kroeker	307c4c0786	Fix typo	2021-06-16 13:41:16 +02:00
Martin Kroeker	e83df93975	Work around another recent macro name collision with winnt.h	2021-06-16 12:32:34 +02:00
Martin Kroeker	a554712439	remove extra/intermediate size step for min_jj introduced in PR747	2020-12-08 21:01:36 +01:00
Martin Kroeker	5d26223f4a	remove extra/intermediate size step of min_jj from PR747	2020-12-08 20:59:56 +01:00
Martin Kroeker	d3ff1f889f	Convert ifndefs to ifneq	2020-11-22 16:27:17 +01:00
Rajalakshmi Srinivasaraghavan	b5d30b390d	Fix build issues with bfloat16 This patch fixes compilation errors due to recent renaming from SH to SB with BUILD_BFLOAT16.	2020-10-13 11:00:22 -05:00
Martin Kroeker	006c7f6671	Change "HALF" and "sh" to "BFLOAT16" and "sb"	2020-10-12 00:06:06 +02:00
Martin Kroeker	886a8e3190	Adapt for supporting only a subset of variable types	2020-10-11 14:57:32 +02:00
Martin Kroeker	ac653c94f3	Merge branch 'develop' into issue2588-cmake	2020-10-11 13:57:07 +02:00
Martin Kroeker	988a6f429e	Add BUILD_vartype defines	2020-09-22 23:23:33 +02:00
Martin Kroeker	e5e2fbd593	Support building only selected types	2020-09-22 23:21:30 +02:00
y00512012	06cf73a239	fix a bug of trmm	2020-09-22 16:47:10 +08:00
Martin Kroeker	ddec244a5a	Merge pull request #2838 from austinpagan/gordon_trmm Adding performance patch for trmm, just like trsm (#2836)	2020-09-15 21:17:48 +02:00
fossum	dfeca46098	Adding performance patch for trmm, just like #2836	2020-09-15 08:59:50 -05:00
fossum	274d6e015b	Fixing a performance bug in trsm_[LR].c.	2020-09-14 13:10:48 -05:00
Martin Kroeker	330044d821	Fix potentiol domain error in sqrt	2020-09-05 09:44:33 +02:00
Chen, Guobing	e740c4873d	Enable COOPERLAKE build target Enable new build target platform -- COOPERLAKE. This target platform supports all the SKYLAKEX supported ISAs + avx512bf16. So all the SKYLAKEX specific kernels/drivers and related code are now extended to be also active on COOPERLAKE. Besides, new BF16 related kernels are active under this target.	2020-08-13 06:18:00 +08:00
Martin Kroeker	ce45af8151	Update conditional for atomics to use HAVE_C11	2020-07-18 17:09:56 +00:00
Martin Kroeker	6f38de06d2	Update conditional for atomics to use HAVE_C11	2020-07-18 17:09:01 +00:00
Martin Kroeker	5dd14e3d48	Make building the bfloat16 functions conditional on option BUILD_HALF (#2590 ) * make building the bfloat16 BLAS functions conditional on BUILD_HALF * pass the BUILD_HALF option to gensymbol * Pass BUILD_HALF as a compiler define for dynamic_arch builds	2020-05-01 09:58:30 +02:00
Rajalakshmi Srinivasaraghavan	7eb55504b1	RFC : Add half precision gemm for bfloat16 in OpenBLAS This patch adds support for bfloat16 data type matrix multiplication kernel. For architectures that don't support bfloat16, it is defined as unsigned short (2 bytes). Default unroll sizes can be changed as per architecture as done for SGEMM and for now 8 and 4 are used for M and N. Size of ncopy/tcopy can be changed as per architecture requirement and for now, size 2 is used. Added shgemm in kernel/power/KERNEL.POWER9 and tested in powerpc64le and powerpc64. For reference, added a small test compare_sgemm_shgemm.c to compare sgemm and shgemm output. This patch does not cover OpenBLAS test, benchmark and lapack tests for shgemm. Complex type implementation can be discussed and added once this is approved.	2020-04-14 14:55:08 -05:00
Ali Saidi	97ce6bbce2	Fix barriers in level3_thread	2020-02-29 17:45:17 +00:00
wjc404	2f96a2c55b	Update trmm_R.c	2020-02-05 10:15:02 +08:00
wjc404	833bd0f8ff	Update trmm_L.c	2020-02-05 10:09:41 +08:00
wjc404	77b8f49556	Update level3_thread.c	2020-02-04 20:33:08 +08:00
wjc404	1c3e20ce48	Update level3.c	2020-02-04 20:30:23 +08:00
wjc404	e9fb8f62b1	Update level3_gemm3m_thread.c	2020-01-22 17:40:03 +00:00
wjc404	4c35b8dbaa	Update gemm3m_level3.c	2019-12-27 18:03:01 +08:00
Martin Kroeker	f3065a0eed	Fix race conditions in multithreaded GEMM3M by adding barriers (and a mutex lock for the non-OpenMP case) like it was already done for GEMM in level3_thread.c some time ago	2019-11-23 19:54:56 +01:00
Martin Kroeker	f343ed65b5	Avoid taking the root of a negative number Fixes #1924 where numpy 1.17+ would report the (transient) FE_INVALID exception raised for the domain error.	2018-12-22 22:30:29 +01:00
Martin Kroeker	f72fdf525c	Merge pull request #1875 from martin-frbg/issue1851 Serialize accesses to parallelized level3 functions from multiple cal…	2018-11-25 20:53:46 +01:00
Martin Kroeker	113cb00b95	fix missing parenthesis	2018-11-19 21:01:36 +01:00
Martin Kroeker	5192651706	Add CriticalSection handling instead of mutexes for Windows	2018-11-19 17:58:22 +01:00
Martin Kroeker	2e6fae2aad	Serialize accesses to parallelized level3 functions from multiple callers for #1851	2018-11-19 14:02:50 +01:00
Arjan van de Ven	5b708e5eb1	sgemm/dgemm: add a way for an arch kernel to specify prefered sizes The current gemm threading code can make very unfortunate choices, for example on my 10 core system a 1024x1024x1024 matrix multiply ends up chunking into blocks of 102... which is not a vector friendly size and performance ends up horrible. this patch adds a helper define where an architecture can specify a preference for size multiples. This is different from existing defines that are minimum sizes and such. The performance increase with this patch for the 1024x1024x1024 sgemm is 2.3x (!!)	2018-11-01 01:43:20 +00:00
Martin Kroeker	5f2a3c05cd	Revert "Rewrite &= -> = and simplify the initial blocking phase."	2018-07-03 21:42:28 +02:00
Craig Donner	0144068537	Rewrite &= -> = and simplify the initial blocking phase.	2018-06-25 15:08:55 +01:00
Arjan van de Ven	73de17664d	Add missing barriers in gemm scheduler a few places in the gemm scheduler code were missing barriers; the code likely worked OK due to heavy use of volatile / _Atomic but there's no reason to get this incorrect	2018-06-17 17:50:43 +00:00

1 2 3

117 Commits