OpenBLAS

Author	SHA1	Message	Date
Wangyang Guo	1d83ca4bca	Small Matrix: support BFLOAT16 data type	2021-08-30 17:40:20 +08:00
Wangyang Guo	478d1086c1	Small Matrix: support DYNAMIC_ARCH build	2021-08-04 03:12:41 +00:00
Chen, Guobing	a7b1f9b1bb	Implementation of BF16 based gemv 1. Add a new API -- sbgemv to support bfloat16 based gemv 2. Implement a generic kernel for sbgemv 3. Implement an avx512-bf16 based kernel for sbgemv Signed-off-by: Chen, Guobing <guobing.chen@intel.com>	2020-10-29 02:08:23 +08:00
Martin Kroeker	cb839575ed	Convert the prototypes of the unimplemented BFLOAT16 functions to the new naming scheme	2020-10-12 14:44:33 +02:00
Martin Kroeker	ca31c32693	Rename "HALF" and "sh" to "BFLOAT16" and "sb"	2020-10-11 23:49:22 +02:00
Martin Kroeker	1c0b03efb4	Merge branch 'develop' into develop	2020-10-11 23:34:14 +02:00
Martin Kroeker	e396ec8b56	Allow building support for only a subset of variable types	2020-10-11 15:11:15 +02:00
Martin Kroeker	c5a32288c6	Work around sgemm_r/dgemm_r not being properly defined with BUILD_COMPLEX/BUILD_COMPLEX16	2020-09-26 23:24:37 +02:00
Martin Kroeker	b886bd672b	add defines for building a subset of types	2020-09-22 23:18:55 +02:00
Chen, Guobing	deaeb6c5b8	Add bfloat16 based dot and conversion with single/double 1. Added bfloat16 based dot as new API: shdot 2. Implemented generic kernel and cooperlake-specific (AVX512-BF16) kernel for shdot 3. Added 4 conversion APIs for bfloat16 data type <=> single/double: shstobf16 shdtobf16 sbf16tos dbf16tod shstobf16 -- convert single float array to bfloat16 array shdtobf16 -- convert double float array to bfloat16 array sbf16tos -- convert bfloat16 array to single float array dbf16tod -- convert bfloat16 array to double float array 4. Implemented generic kernels for all 4 conversion APIs, and cooperlake-specific kernel for shstobf16 and shdtobf16 5. Update level1 thread facilitate functions and macros to support multi-threading for these new APIs 6. Fix Cooperlake platform detection/specify issue when under dynamic-arch building 7. Change the typedef of bfloat16 from unsigned short to more strict uint16_t Signed-off-by: Chen, Guobing <guobing.chen@intel.com>	2020-09-04 02:31:25 +08:00
Martin Kroeker	75eeb265d7	[WIP] Refactor the driver code for direct SGEMM (#2782 ) Move "direct SGEMM" functionality out of the SkylakeX SGEMM kernel and make it available (on x86_64 targets only for now) in DYNAMIC_ARCH builds * Add sgemm_direct targets in the kernel Makefile.L3 and CMakeLists.txt * Add direct_sgemm functions to the gotoblas struct in common_param.h * Move sgemm_direct_performant helper to separate file * Update gemm.c to macros for sgemm_direct to support dynamic_arch naming via common_s,h * (Conditionally) add sgemm_direct functions in setparam-ref.c	2020-08-19 14:51:09 +02:00
Martin Kroeker	5dd14e3d48	Make building the bfloat16 functions conditional on option BUILD_HALF (#2590 ) * make building the bfloat16 BLAS functions conditional on BUILD_HALF * pass the BUILD_HALF option to gensymbol * Pass BUILD_HALF as a compiler define for dynamic_arch builds	2020-05-01 09:58:30 +02:00
Rajalakshmi Srinivasaraghavan	67cc4b9e16	Fix warnings in clang and export symbol	2020-04-15 19:15:23 -05:00
Rajalakshmi Srinivasaraghavan	a87793e03c	Fix DYNAMIC_ARCH compilation errors	2020-04-15 09:09:50 -05:00
Rajalakshmi Srinivasaraghavan	ac6a22ae78	Update header	2020-04-14 22:58:39 -05:00
Rajalakshmi Srinivasaraghavan	7eb55504b1	RFC : Add half precision gemm for bfloat16 in OpenBLAS This patch adds support for bfloat16 data type matrix multiplication kernel. For architectures that don't support bfloat16, it is defined as unsigned short (2 bytes). Default unroll sizes can be changed as per architecture as done for SGEMM and for now 8 and 4 are used for M and N. Size of ncopy/tcopy can be changed as per architecture requirement and for now, size 2 is used. Added shgemm in kernel/power/KERNEL.POWER9 and tested in powerpc64le and powerpc64. For reference, added a small test compare_sgemm_shgemm.c to compare sgemm and shgemm output. This patch does not cover OpenBLAS test, benchmark and lapack tests for shgemm. Complex type implementation can be discussed and added once this is approved.	2020-04-14 14:55:08 -05:00
Martin Kroeker	5c42287c4f	Add declarations for ?sum and cblas_?sum	2019-03-30 21:58:03 +01:00
Martin Kroeker	7e860acd38	Correct zgeadd_k prototype	2017-11-29 19:57:35 +01:00
Isuru Fernando	ca17b4b75c	Fix complex support for MSVC headers	2017-07-28 11:50:29 +05:30
Zhang Xianyi	69363622a8	Fix DYNAMIC_ARCH=1 bug.	2015-10-27 05:10:40 +08:00
Martin Koehler	711ca33bc6	Improved Ximatcopy when lda==ldb. The Ximatcopy functions create a copy of the input matrix although they seem to work inplace. The new routines XIMATCOPY_K_YY perform the operations inplace if the leading dimension does not change.	2015-09-07 14:36:16 +02:00
Martin Koehler	39cc6b21d3	Add ATLAS-style ?geadd function	2015-02-16 13:46:20 +01:00
wernsaar	f1b9a4a1ca	Ref #454 : fixed bug in common_param.h	2014-09-23 11:34:29 +02:00
wernsaar	7aae4a62e7	enabled use of GEMM3M functions	2014-09-20 14:27:10 +02:00
wernsaar	125610d23b	allow to set custom value for ?GEMM_DEFAULT_UNROLL_MN, optimizations for syrk	2014-07-24 18:43:31 +02:00
Timothy Gu	6c2ead30f0	Remove all trailing whitespace except lapack-netlib Signed-off-by: Timothy Gu <timothygu99@gmail.com>	2014-06-27 12:05:18 -07:00
wernsaar	cee257f384	Ref #51 : added blas extensions zomatcopy and comatcopy	2014-06-10 10:34:54 +02:00
wernsaar	7bfb3011e8	Ref #51 : added blas extension somatcopy	2014-06-09 20:21:13 +02:00
wernsaar	8c8f596238	Ref #51 : added blas extension domatcopy as not opimized reference	2014-06-09 17:11:07 +02:00
wernsaar	faf3ac0aad	Ref #285 : added axpby kernels	2014-06-08 11:54:24 +02:00
traits	9fc6764fa7	refs #55 . Added DTB_ENTRIES into dynamic arch setting parameters. Now, it can read DTB_ENTRIES on runtime.	2011-09-05 17:37:07 +08:00
Xianyi Zhang	342bbc3871	Import GotoBLAS2 1.13 BSD version codes.	2011-01-24 14:54:24 +00:00

32 Commits