OpenBLAS

Commit Graph

Author	SHA1	Message	Date
Zhang Xianyi	88e6806e3f	Init cblas_?gemm_batch implementation.	2020-12-14 09:50:32 +08:00
Xianyi Zhang	741d6c5cb8	Refs #2587 Add small matrix optimization reference kernel for c/zgemm.	2020-08-28 21:00:54 +08:00
Xianyi Zhang	712ca43069	Change a1b0 gemm to b0 gemm.	2020-08-28 07:55:27 +08:00
Xianyi Zhang	43bef4aaac	Add alpha=1.0 beta=0.0 for small gemm.	2020-04-28 22:35:36 +08:00
Xianyi Zhang	aae6af94bb	Add small marix optimization kernel interface. make SMALL_MATRIX_OPT=1	2020-04-28 19:02:41 +08:00
Martin Kroeker	7dbb59b256	Update common_macro.h	2020-04-18 21:34:14 +02:00
Martin Kroeker	c7d668c248	Update common_macro.h	2020-04-18 16:04:38 +02:00
Martin Kroeker	e7afe8a969	Define AXPBY_K fallback for float16	2020-04-18 11:10:15 +02:00
Rajalakshmi Srinivasaraghavan	22bb50fb81	cmake fixes	2020-04-17 13:35:17 -05:00
Rajalakshmi Srinivasaraghavan	7eb55504b1	RFC : Add half precision gemm for bfloat16 in OpenBLAS This patch adds support for bfloat16 data type matrix multiplication kernel. For architectures that don't support bfloat16, it is defined as unsigned short (2 bytes). Default unroll sizes can be changed as per architecture as done for SGEMM and for now 8 and 4 are used for M and N. Size of ncopy/tcopy can be changed as per architecture requirement and for now, size 2 is used. Added shgemm in kernel/power/KERNEL.POWER9 and tested in powerpc64le and powerpc64. For reference, added a small test compare_sgemm_shgemm.c to compare sgemm and shgemm output. This patch does not cover OpenBLAS test, benchmark and lapack tests for shgemm. Complex type implementation can be discussed and added once this is approved.	2020-04-14 14:55:08 -05:00
Guillaume Horel	c7b5a459b6	add missing defines and headers	2019-09-08 11:14:49 -04:00
Guillaume Horel	ea747cf933	start working on ?trtrs	2019-09-08 11:14:49 -04:00
Martin Kroeker	5c42287c4f	Add declarations for ?sum and cblas_?sum	2019-03-30 21:58:03 +01:00
Ashwin Sekhar T K	4713e7c47f	ARM64: Add the VULCAN Target	2017-01-10 15:01:17 +05:30
Martin Koehler	711ca33bc6	Improved Ximatcopy when lda==ldb. The Ximatcopy functions create a copy of the input matrix although they seem to work inplace. The new routines XIMATCOPY_K_YY perform the operations inplace if the leading dimension does not change.	2015-09-07 14:36:16 +02:00
Martin Koehler	39cc6b21d3	Add ATLAS-style ?geadd function	2015-02-16 13:46:20 +01:00
wernsaar	cee257f384	Ref #51 : added blas extensions zomatcopy and comatcopy	2014-06-10 10:34:54 +02:00
wernsaar	7bfb3011e8	Ref #51 : added blas extension somatcopy	2014-06-09 20:21:13 +02:00
wernsaar	8c8f596238	Ref #51 : added blas extension domatcopy as not opimized reference	2014-06-09 17:11:07 +02:00
wernsaar	faf3ac0aad	Ref #285 : added axpby kernels	2014-06-08 11:54:24 +02:00
Xianyi Zhang	4727fe8abf	Refs #47 . On Loongson 3A, set DGEMM_R parameter depending on different number of threads. It would improve double precision BLAS3 on multi-threads.	2011-09-05 15:13:52 +00:00
Xianyi Zhang	342bbc3871	Import GotoBLAS2 1.13 BSD version codes.	2011-01-24 14:54:24 +00:00

22 Commits