OpenBLAS

Commit Graph

Author	SHA1	Message	Date
Honglin Zhu	71e4125795	Fix syscall error on non-x86 platform	2023-05-22 21:59:59 +08:00
Honglin Zhu	90f041e348	Invoke the syscall to allow the use of amx tiles	2023-05-19 10:48:18 +08:00
Wangyang Guo	4289cf048d	sbgemm: avoid falling into SGEMM_KERNEL_DIRECT	2021-09-07 21:30:46 +08:00
Wangyang Guo	2e44ca0136	sbgemm: add missing cblas_sbgemm definition	2021-08-30 17:40:30 +08:00
Wangyang Guo	1d83ca4bca	Small Matrix: support BFLOAT16 data type	2021-08-30 17:40:20 +08:00
Wangyang Guo	c17d6dacb2	Small Matrix: skip compile in unimplemented data type	2021-08-05 05:46:13 +00:00
Wangyang Guo	aa50185647	Small Matrix: better handle with GEMM3M marco	2021-08-05 02:45:53 +00:00
Wangyang Guo	478d1086c1	Small Matrix: support DYNAMIC_ARCH build	2021-08-04 03:12:41 +00:00
Wangyang Guo	5dc7c3c8e5	Small Matrix: add GEMM_SMALL_MATRIX_PERMIT to tune small matrics case	2021-08-02 07:06:54 +00:00
Xianyi Zhang	6022e5629c	Refs #2587 fix small matrix c/zgemm bug.	2021-08-02 07:06:54 +00:00
Xianyi Zhang	57ed58cefe	Refs #2587 Add small matrix optimization reference kernel for c/zgemm.	2021-08-02 07:06:54 +00:00
Xianyi Zhang	17d32a4a82	Change a1b0 gemm to b0 gemm.	2021-08-02 07:06:54 +00:00
Xianyi Zhang	4271cfcc6f	Fix gemm interface bug for small matrix.	2021-08-02 07:06:51 +00:00
Xianyi Zhang	be3349405d	Add alpha=1.0 beta=0.0 for small gemm.	2021-08-02 07:01:47 +00:00
Xianyi Zhang	0a2077901c	Add small marix optimization kernel interface. make SMALL_MATRIX_OPT=1	2021-08-02 07:01:47 +00:00
Martin Kroeker	7bb59fceb7	Clean up some warnings	2021-07-11 16:00:29 +02:00
Gordon Fossum	8b599836db	Add error message token for SBGEMM in gemm.c	2021-05-04 13:55:02 -05:00
Alex Henrie	6f32991eae	Don't define the mode variable when not needed in gemm functions	2021-01-14 19:40:31 -07:00
Martin Kroeker	75eeb265d7	[WIP] Refactor the driver code for direct SGEMM (#2782 ) Move "direct SGEMM" functionality out of the SkylakeX SGEMM kernel and make it available (on x86_64 targets only for now) in DYNAMIC_ARCH builds * Add sgemm_direct targets in the kernel Makefile.L3 and CMakeLists.txt * Add direct_sgemm functions to the gotoblas struct in common_param.h * Move sgemm_direct_performant helper to separate file * Update gemm.c to macros for sgemm_direct to support dynamic_arch naming via common_s,h * (Conditionally) add sgemm_direct functions in setparam-ref.c	2020-08-19 14:51:09 +02:00
Rajalakshmi Srinivasaraghavan	7eb55504b1	RFC : Add half precision gemm for bfloat16 in OpenBLAS This patch adds support for bfloat16 data type matrix multiplication kernel. For architectures that don't support bfloat16, it is defined as unsigned short (2 bytes). Default unroll sizes can be changed as per architecture as done for SGEMM and for now 8 and 4 are used for M and N. Size of ncopy/tcopy can be changed as per architecture requirement and for now, size 2 is used. Added shgemm in kernel/power/KERNEL.POWER9 and tested in powerpc64le and powerpc64. For reference, added a small test compare_sgemm_shgemm.c to compare sgemm and shgemm output. This patch does not cover OpenBLAS test, benchmark and lapack tests for shgemm. Complex type implementation can be discussed and added once this is approved.	2020-04-14 14:55:08 -05:00
Martin Kroeker	8229c163b7	Use runtime check for AVX512 (sgemm_direct) capability when using DYNAMIC_ARCH	2020-03-26 21:12:56 +01:00
Martin Kroeker	6a14b34c20	Avoid calling DIRECT codepath in DYNAMIC_ARCH on non-SKX	2020-03-22 14:33:16 +01:00
Arjan van de Ven	cdc668d82b	Add a "sgemm direct" mode for small matrixes OpenBLAS has a fancy algorithm for copying the input data while laying it out in a more CPU friendly memory layout. This is great for large matrixes; the cost of the copy is easily ammortized by the gains from the better memory layout. But for small matrixes (on CPUs that can do efficient unaligned loads) this copy can be a net loss. This patch adds (for SKYLAKEX initially) a "sgemm direct" mode, that bypasses the whole copy machinary for ALPHA=1/BETA=0/... standard arguments, for small matrixes only. What is small? For the non-threaded case this has been measured to be in the MNK = 28 * 512 * 512 range, while in the threaded case it's less, around MNK = 1 * 512 * 512	2018-12-13 13:47:31 +00:00
Craig Donner	66316b9f4c	Improve performance of GEMM for small matrices when SMP is defined. Always checking num_cpu_avail() regardless of whether threading will actually be used adds noticeable overhead for small matrices. Most other uses of num_cpu_avail() do so only if threading will be used, so do the same here.	2018-06-07 15:29:13 +01:00
Martin Kroeker	2c222f1faa	Modify complex CBLAS functions to take void pointers Modify complex CBLAS functions to take void pointers instead of float or double arguments (to bring the prototypes in line with netlib and other implementations' cblas.h)	2017-11-05 15:53:14 +01:00
Hank Anderson	e74462a3f5	Moved declarations to start of functions to satisfy MSVC C89 implementation.	2015-02-11 11:16:57 -06:00
wernsaar	3300f5ebff	optimized multithreading lower limits	2014-09-15 11:38:25 +02:00
wernsaar	d286daa2ba	adjusted number of threads for small size	2014-07-15 14:41:35 +02:00
Timothy Gu	6c2ead30f0	Remove all trailing whitespace except lapack-netlib Signed-off-by: Timothy Gu <timothygu99@gmail.com>	2014-06-27 12:05:18 -07:00
wernsaar	a19d209005	Ref #103 : enhancement for small matrix dimensions	2014-06-18 15:04:11 +02:00
Lars Buitinck	3f7b0cd994	Merge pull request #290 from larsmans/missing-threshold check if GEMM_MULTITHREAD_THRESHOLD defined in gemm.c Set a fallback value.	2013-08-29 00:33:55 +08:00
Xianyi Zhang	31c836ac25	Ref #79 Added GEMM_MULTITHREAD_THRESHOLD flag to use single thread in gemm function with small matrices.	2012-03-23 01:17:41 +08:00
Xianyi Zhang	342bbc3871	Import GotoBLAS2 1.13 BSD version codes.	2011-01-24 14:54:24 +00:00

33 Commits