OpenBLAS

Commit Graph

Author	SHA1	Message	Date
Sergei Lewis	cb0a70e0e2	dot.c early bail fix	2023-03-02 09:51:10 +00:00
Ivan Pribec	802e71bf05	Add const attribute to lsame	2022-08-08 15:15:52 +02:00
Martin Kroeker	ef24712030	Move a conditionally used variable	2021-09-11 14:37:44 +02:00
Wangyang Guo	619588fbab	sbgemm: remove unnecessary b0 files	2021-08-30 17:55:01 +08:00
Wangyang Guo	1d83ca4bca	Small Matrix: support BFLOAT16 data type	2021-08-30 17:40:20 +08:00
Wangyang Guo	989e6bbdd3	Small Matrix: reduce generic kernel source files	2021-08-13 03:17:38 +00:00
Wangyang Guo	6b58bca18b	Small Matrix: disable low performance default kernel	2021-08-03 06:49:03 +00:00
Wangyang Guo	5dc7c3c8e5	Small Matrix: add GEMM_SMALL_MATRIX_PERMIT to tune small matrics case	2021-08-02 07:06:54 +00:00
Xianyi Zhang	6022e5629c	Refs #2587 fix small matrix c/zgemm bug.	2021-08-02 07:06:54 +00:00
Xianyi Zhang	57ed58cefe	Refs #2587 Add small matrix optimization reference kernel for c/zgemm.	2021-08-02 07:06:54 +00:00
Xianyi Zhang	17d32a4a82	Change a1b0 gemm to b0 gemm.	2021-08-02 07:06:54 +00:00
Xianyi Zhang	be3349405d	Add alpha=1.0 beta=0.0 for small gemm.	2021-08-02 07:01:47 +00:00
Xianyi Zhang	0a2077901c	Add small marix optimization kernel interface. make SMALL_MATRIX_OPT=1	2021-08-02 07:01:47 +00:00
damonyu	ef8e7d0279	Add the support for RISC-V Vector. Change-Id: Iae7800a32f5af3903c330882cdf6f292d885f266	2020-10-15 16:09:02 +08:00
Martin Kroeker	756062afa5	Rename "HALF" and "sh" to "BFLOAT16" and "sb"	2020-10-11 23:56:17 +02:00
Qiyu8	60e6c68e38	Adapt ARM architect	2020-09-29 16:36:14 +08:00
Qiyu8	1b1a757f5f	Optimize the performance of dot by using universal intrinsics in X86/ARM	2020-09-28 20:36:53 +08:00
Rajalakshmi Srinivasaraghavan	d23419accc	powerpc: Optimized SHGEMM kernel for POWER10 This patch introduces new optimized version of SHGEMM kernel using power10 Matrix-Multiply Assist (MMA) feature introduced in POWER ISA v3.1. This patch makes use of new POWER10 compute instructions for matrix multiplication operation. Tested on simulator and there are no new test failures.	2020-06-25 22:19:08 -05:00
Rajalakshmi Srinivasaraghavan	a87793e03c	Fix DYNAMIC_ARCH compilation errors	2020-04-15 09:09:50 -05:00
Rajalakshmi Srinivasaraghavan	7eb55504b1	RFC : Add half precision gemm for bfloat16 in OpenBLAS This patch adds support for bfloat16 data type matrix multiplication kernel. For architectures that don't support bfloat16, it is defined as unsigned short (2 bytes). Default unroll sizes can be changed as per architecture as done for SGEMM and for now 8 and 4 are used for M and N. Size of ncopy/tcopy can be changed as per architecture requirement and for now, size 2 is used. Added shgemm in kernel/power/KERNEL.POWER9 and tested in powerpc64le and powerpc64. For reference, added a small test compare_sgemm_shgemm.c to compare sgemm and shgemm output. This patch does not cover OpenBLAS test, benchmark and lapack tests for shgemm. Complex type implementation can be discussed and added once this is approved.	2020-04-14 14:55:08 -05:00
Qiyu8	ff42e68652	Optimize genenal Gemm Beta	2020-01-20 11:49:42 +08:00
Andrew	1e531701b7	fix small typo	2018-09-09 16:52:25 +02:00
Martin Kroeker	7a7619af6d	Revert changes from PR#1419 at least one of these changes apparently is an oversimplification, leading to TRMM breakage on some platforms as observed in #1563	2018-05-17 11:40:08 +02:00
Andrew	e5cc3d72c0	core.IdenticalExpr clang501 checker	2018-01-19 23:17:43 +01:00
Andrew	9fa986337d	add missing brackets to silence indentation warnings gcc721	2018-01-19 23:11:12 +01:00
Andrew	3eed97f6b9	Initialize values to silence cppcheck	2018-01-12 22:35:00 +01:00
Andrew	d602b99386	LAPACK helpers in C that need care too	2018-01-02 14:38:50 +01:00
Andrew	4d0b005e5b	Eliminate remaining unused results in kernels (clang5 analyzer)	2018-01-01 20:54:39 +01:00
Andrew	03e5ff0687	initialize potentially unitialized variables (clang5)	2017-12-26 09:24:24 +01:00
Andrew	47deec2c1a	fix couple of dead assignment warnings	2017-12-22 00:56:35 +01:00
Andrew	281a2b952f	warning cleanup (#1380 ) * dead increments in driver/level2 * dead increments in kernel/generic * part dead increments in kernel/x86_64	2017-12-05 19:54:10 +01:00
Martin Kroeker	8213385ab8	Work around compiler warnings for unused variables in the generic zgemm3m_Xcopy kernels	2017-12-02 22:51:58 +01:00
Andrew	441a9c8385	more dead increments clang4 scan-build deadcode.deadstores	2017-11-26 17:24:08 +01:00
Andrew	1236dbe5a6	Eliminate 2-8 dead increments code	2017-11-26 13:26:11 +01:00
Martin Kroeker	65bf0a343c	Remove unused variable btpr	2017-11-14 23:25:50 +01:00
Martin Kroeker	9d92f526dd	Comment out a code block that performs out-of-bounds memory accesses ...and does not appear to be needed even when it stays within the bounds of the array	2017-10-06 23:51:32 +02:00
Martin Kroeker	f96afd94b0	Fix out-of-bounds accesses where the data should be zero anyway	2017-10-01 01:06:39 +02:00
Andrew	becf8bc7a0	remove dead code	2016-10-31 12:46:56 +01:00
Yichao Yu	594b9f4c73	Do not use vsub to clear the register values since it doesn't work with non-normal numbers.	2016-01-05 16:54:05 +00:00
Ashwin Sekhar T K	45f78963ac	Optimized cgemm kernel for CORTEXA57 Also, add a generic ztrmm 4x4 kernel	2015-11-09 14:15:53 +05:30
Martin Koehler	711ca33bc6	Improved Ximatcopy when lda==ldb. The Ximatcopy functions create a copy of the input matrix although they seem to work inplace. The new routines XIMATCOPY_K_YY perform the operations inplace if the leading dimension does not change.	2015-09-07 14:36:16 +02:00
Zhang Xianyi	1cf2b10224	Use pure C generic target on x86 and x86_64. make TARGET=GENERIC ?gemm3m is unimplemented on generic target.	2015-08-03 23:55:56 -05:00
Werner Saar	9bd962f655	modified haswell parameter dgemm_unroll_n	2015-06-13 10:28:27 +02:00
Zhang Xianyi	ea7f9dacf4	Refs #509 . Fixed geadd building bug with DYNAMIC_ARCH=1.	2015-02-26 01:47:11 +08:00
Zhang Xianyi	2fb02626da	Update organization info.	2014-11-25 15:28:58 +08:00
Benedikt Huber	58c90d5937	# The first commit's message is: Optimizations for APM's xgene-1 (aarch64). 1) general system updates to support armv8 better. Make all did not work, one needed to supply TARGET=ARMV8. 2) sgem 4x4 kernel in assembler using SIMD, and configuration changes to use it. 3) strmm 4x4 kernel in C. Since the sgem kernel does 4x4, the trmm kernel must also do 4xN. Added Dave Nuechterlein to the contributors list.	2014-11-11 22:19:23 +08:00
wernsaar	b079df9ef4	added optimized sdot- and dsdot-kernel, written in C	2014-06-30 14:46:38 +02:00
Timothy Gu	6c2ead30f0	Remove all trailing whitespace except lapack-netlib Signed-off-by: Timothy Gu <timothygu99@gmail.com>	2014-06-27 12:05:18 -07:00
Zhang Xianyi	6c4a7d0828	Import AMD Piledriver DGEMM kernel generated by AUGEM. So far, this kernel doesn't deal with edge. AUGEM: Automatically Generate High Performance Dense Linear Algebra Kernels on x86 CPUs. Qian Wang, Xianyi Zhang, Yunquan Zhang, and Qing Yi. In the International Conference for High Performance Computing, Networking, Storage and Analysis (SC'13). Denver, CO. Nov, 2013.	2013-08-25 10:16:01 -03:00
wernsaar	cff70a666d	added generic trmm kernels and modified Makefile.L3	2013-07-30 20:18:57 +02:00

1 2

53 Commits