OpenBLAS

Commit Graph

Author	SHA1	Message	Date
Jiaxun Yang	fa14bdb26d	Entitle missing declearation for alpha Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com>	2022-08-11 15:02:58 +01:00
Egbert Eich	5e6d160020	Do not include symbols defined in driver/others/parameter.c in DYNAMIC_ARCH driver/others/parameter.c does not get build during DYNAMIC_ARCH, thus, do not declare its symbols. This will make the build fail early and in an obvious way if functions are trying to use these symbols. Signed-off-by: Egbert Eich <eich@suse.com>	2022-03-29 10:01:28 +02:00
Martin Kroeker	bc93f468ef	Add Elbrus E2000 architecture as generic x86_64 compatible	2022-01-22 18:53:38 +01:00
Wangyang Guo	1d83ca4bca	Small Matrix: support BFLOAT16 data type	2021-08-30 17:40:20 +08:00
Wangyang Guo	5dc7c3c8e5	Small Matrix: add GEMM_SMALL_MATRIX_PERMIT to tune small matrics case	2021-08-02 07:06:54 +00:00
Xianyi Zhang	57ed58cefe	Refs #2587 Add small matrix optimization reference kernel for c/zgemm.	2021-08-02 07:06:54 +00:00
Xianyi Zhang	17d32a4a82	Change a1b0 gemm to b0 gemm.	2021-08-02 07:06:54 +00:00
Xianyi Zhang	be3349405d	Add alpha=1.0 beta=0.0 for small gemm.	2021-08-02 07:01:47 +00:00
Xianyi Zhang	0a2077901c	Add small marix optimization kernel interface. make SMALL_MATRIX_OPT=1	2021-08-02 07:01:47 +00:00
gxw	af0a69f355	Add support for LOONGARCH64	2021-07-27 15:29:12 +08:00
Chen, Guobing	a7b1f9b1bb	Implementation of BF16 based gemv 1. Add a new API -- sbgemv to support bfloat16 based gemv 2. Implement a generic kernel for sbgemv 3. Implement an avx512-bf16 based kernel for sbgemv Signed-off-by: Chen, Guobing <guobing.chen@intel.com>	2020-10-29 02:08:23 +08:00
Rajalakshmi Srinivasaraghavan	b5d30b390d	Fix build issues with bfloat16 This patch fixes compilation errors due to recent renaming from SH to SB with BUILD_BFLOAT16.	2020-10-13 11:00:22 -05:00
Martin Kroeker	629c497b6c	common_sh.h renamed to common_sb.h	2020-10-12 00:27:11 +02:00
Martin Kroeker	ca31c32693	Rename "HALF" and "sh" to "BFLOAT16" and "sb"	2020-10-11 23:49:22 +02:00
Chen, Guobing	deaeb6c5b8	Add bfloat16 based dot and conversion with single/double 1. Added bfloat16 based dot as new API: shdot 2. Implemented generic kernel and cooperlake-specific (AVX512-BF16) kernel for shdot 3. Added 4 conversion APIs for bfloat16 data type <=> single/double: shstobf16 shdtobf16 sbf16tos dbf16tod shstobf16 -- convert single float array to bfloat16 array shdtobf16 -- convert double float array to bfloat16 array sbf16tos -- convert bfloat16 array to single float array dbf16tod -- convert bfloat16 array to double float array 4. Implemented generic kernels for all 4 conversion APIs, and cooperlake-specific kernel for shstobf16 and shdtobf16 5. Update level1 thread facilitate functions and macros to support multi-threading for these new APIs 6. Fix Cooperlake platform detection/specify issue when under dynamic-arch building 7. Change the typedef of bfloat16 from unsigned short to more strict uint16_t Signed-off-by: Chen, Guobing <guobing.chen@intel.com>	2020-09-04 02:31:25 +08:00
Martin Kroeker	7dbb59b256	Update common_macro.h	2020-04-18 21:34:14 +02:00
Martin Kroeker	c7d668c248	Update common_macro.h	2020-04-18 16:04:38 +02:00
Martin Kroeker	e7afe8a969	Define AXPBY_K fallback for float16	2020-04-18 11:10:15 +02:00
Rajalakshmi Srinivasaraghavan	22bb50fb81	cmake fixes	2020-04-17 13:35:17 -05:00
Rajalakshmi Srinivasaraghavan	7eb55504b1	RFC : Add half precision gemm for bfloat16 in OpenBLAS This patch adds support for bfloat16 data type matrix multiplication kernel. For architectures that don't support bfloat16, it is defined as unsigned short (2 bytes). Default unroll sizes can be changed as per architecture as done for SGEMM and for now 8 and 4 are used for M and N. Size of ncopy/tcopy can be changed as per architecture requirement and for now, size 2 is used. Added shgemm in kernel/power/KERNEL.POWER9 and tested in powerpc64le and powerpc64. For reference, added a small test compare_sgemm_shgemm.c to compare sgemm and shgemm output. This patch does not cover OpenBLAS test, benchmark and lapack tests for shgemm. Complex type implementation can be discussed and added once this is approved.	2020-04-14 14:55:08 -05:00
Guillaume Horel	c7b5a459b6	add missing defines and headers	2019-09-08 11:14:49 -04:00
Guillaume Horel	ea747cf933	start working on ?trtrs	2019-09-08 11:14:49 -04:00
Martin Kroeker	5c42287c4f	Add declarations for ?sum and cblas_?sum	2019-03-30 21:58:03 +01:00
Ashwin Sekhar T K	4713e7c47f	ARM64: Add the VULCAN Target	2017-01-10 15:01:17 +05:30
Martin Koehler	711ca33bc6	Improved Ximatcopy when lda==ldb. The Ximatcopy functions create a copy of the input matrix although they seem to work inplace. The new routines XIMATCOPY_K_YY perform the operations inplace if the leading dimension does not change.	2015-09-07 14:36:16 +02:00
Martin Koehler	39cc6b21d3	Add ATLAS-style ?geadd function	2015-02-16 13:46:20 +01:00
wernsaar	cee257f384	Ref #51 : added blas extensions zomatcopy and comatcopy	2014-06-10 10:34:54 +02:00
wernsaar	7bfb3011e8	Ref #51 : added blas extension somatcopy	2014-06-09 20:21:13 +02:00
wernsaar	8c8f596238	Ref #51 : added blas extension domatcopy as not opimized reference	2014-06-09 17:11:07 +02:00
wernsaar	faf3ac0aad	Ref #285 : added axpby kernels	2014-06-08 11:54:24 +02:00
Xianyi Zhang	4727fe8abf	Refs #47 . On Loongson 3A, set DGEMM_R parameter depending on different number of threads. It would improve double precision BLAS3 on multi-threads.	2011-09-05 15:13:52 +00:00
Xianyi Zhang	342bbc3871	Import GotoBLAS2 1.13 BSD version codes.	2011-01-24 14:54:24 +00:00

32 Commits