OpenBLAS

Commit Graph

Author	SHA1	Message	Date
Martin Kroeker	772b0cc715	Fix early bailout	2023-06-27 16:12:27 +02:00
Martin Kroeker	d6be5036d7	Fix IDAMAX	2023-06-26 21:19:33 +02:00
Martin Kroeker	1fe96f8da7	Fix failures to handle increments of zero	2023-06-25 22:36:57 +02:00
Martin Kroeker	73b30b1dec	Fix VLEV_FLOAT/VSEV_FLOAT macros to compile with t-head 2.6.1	2023-06-18 17:46:29 +02:00
Martin Kroeker	c3a2d407a0	Merge pull request #4048 from imzhuhl/spr_sbgemm_fix Sapphire Rapids sbgemm fix	2023-06-17 20:47:09 +02:00
Manjul Mohan	58b88aa5f0	POWER10: Fix compiler warnings This patch removes the warning messages related to unused variables in sbgemm_kernel_power10.c. Signed-off-by: Manjul Mohan <manjul@linux.vnet.ibm.com>	2023-06-12 01:08:59 -04:00
Honglin Zhu	9e80a194d6	Fix dynamic_list build and gcc version check error	2023-05-21 19:52:58 +08:00
Honglin Zhu	a76afdc047	Compatible with older version of GNU make	2023-05-20 13:58:23 +08:00
Honglin Zhu	90f041e348	Invoke the syscall to allow the use of amx tiles	2023-05-19 10:48:18 +08:00
Honglin Zhu	0b83088887	spr dynamic arch support	2023-05-19 10:48:18 +08:00
Honglin Zhu	f249ccb741	Fix spr sbgemm error	2023-05-19 10:48:18 +08:00
Martin Kroeker	e9a8d5b45f	Merge pull request #4015 from martin-frbg/issue4013-2 [WIP] Disable gcc's tree-vectorizer for x86_64 CGEMV	2023-04-23 18:51:12 +02:00
Martin Kroeker	72caceb324	Merge pull request #4009 from Mousius/sve-gemm Use SVE kernel for SGEMM/DGEMM on Arm(R) Neoverse(TM) V1	2023-04-22 13:56:45 +02:00
Martin Kroeker	84bcf6639f	Disable gcc's tree-vectorizer pass on all operating systems	2023-04-20 23:24:52 +02:00
Martin Kroeker	c9174ae8d7	Disable gcc's tree-vectorizer pass on all operating systems	2023-04-19 23:45:44 +02:00
Martin Kroeker	c2fe9cb91f	Disable gcc's tree-vectorizer pass on all operating systems	2023-04-19 23:45:14 +02:00
Martin Kroeker	66b39b835c	Disable gcc's tree-vectorizer pass on all operating systems	2023-04-19 23:44:45 +02:00
Martin Kroeker	bb6d6735bf	Disable gcc's tree-vectorizer pass on all operating systems	2023-04-19 23:44:15 +02:00
Martin Kroeker	d18efaed20	Disable gcc's tree-vectorizer pass on all operating systems	2023-04-19 23:43:43 +02:00
Martin Kroeker	99f6d31ed5	Disable gcc's tree-vectorizer pass on all operating systems	2023-04-19 23:42:55 +02:00
Martin Kroeker	7de9335c56	Disable gcc's tree-vectorizer pass on all operating systems	2023-04-19 23:42:09 +02:00
Martin Kroeker	437c0bf2b4	Merge pull request #3843 from Mousius/switch-ratio Propagate SWITCH_RATIO to DYNAMIC_ARCH builds	2023-04-19 11:51:54 +02:00
Chris Sidebottom	ec334e69dc	Use SVE kernel for SGEMM/DGEMM on Arm(R) Neoverse(TM) V1 This re-spins #3869 with some additional copy unrolling which helps maintain SYRK performance. After #3868, the SVE kernels represent a pretty good boost. This re-uses ARMV8SVE as a base and I'm going to incrementally move everything to use ARMV8SVE in additional patches (as well as fix up anything that's not already in ARMV8SVE).	2023-04-17 17:38:42 +01:00
Chris Sidebottom	32f2fafde7	Propagate SWITCH_RATIO to DYNAMIC_ARCH builds Previously dynamic builds were either using the default SWITCH_RATIO or one from the higher level architecture; this patch ensures the dynamic builds can use this parameter as well.	2023-04-17 15:34:12 +01:00
Martin Kroeker	44164e3a3d	revert "move alpha out of register 18" (out of PR scope, no SVE on Apple hw)	2023-04-17 14:23:13 +02:00
Martin Kroeker	8be68fa7f4	move declaration of sca to really keep the compiler from throwing it out (for now)	2023-04-15 12:02:39 +02:00
Martin Kroeker	3727672a74	Improve workaround and keep compilers from optimizing it out	2023-04-13 18:07:52 +02:00
Martin Kroeker	108a21e47a	Move ALPHA out of register 18 (reserved on OSX)	2023-04-13 18:05:14 +02:00
Martin Kroeker	0b1acb0ba3	Move ALPHA_I out of register 18 (reserved on OSX)	2023-04-13 18:03:35 +02:00
Martin Kroeker	c7bbad09ad	Move ALPHA_I out of register 18 (reserved on OSX)	2023-04-13 18:00:47 +02:00
Martin Kroeker	cda29633a3	move ALPHA_I out of register 18 (reserved on OSX)	2023-04-13 17:59:48 +02:00
Martin Kroeker	09ace3cf23	Merge pull request #3846 from lilh9598/sbgemm_opt Improve the performance of sbgemm_tcopy on neoversen2	2023-03-26 19:04:57 +02:00
Sergei Lewis	cb0a70e0e2	dot.c early bail fix	2023-03-02 09:51:10 +00:00
Martin Kroeker	38d6fb4225	Fix dependencies in builds with specified subsets of precision types	2023-02-23 23:12:06 +01:00
Martin Kroeker	e412bee313	fix GEMM kernel dependencies in builds that use only a subset of precisions	2023-02-22 00:37:14 +01:00
Martin Kroeker	d80adf253e	make SSYMV available to BUILD_DOUBLE-only builds	2023-02-22 00:30:20 +01:00
Martin Kroeker	5481c328e8	fix DYNAMIC_ARCH builds that use only a subset of precisions	2023-02-22 00:28:25 +01:00
Martin Kroeker	5a9cd87794	Merge pull request #3868 from Mousius/sve-prefetch Remove prefetches from SVE kernels	2022-12-24 10:52:29 +01:00
Chris Sidebottom	1361229291	Remove prefetches from SVE kernels This is a precursor to enabling the SVE kernels for Arm(R) Neoverse(TM) V1 which has 256-bit SVE. Testing revealed that the SVE kernel was actually worse in some cases than the existing kernel which seemed odd - removing these prefetches the underlying architecture seems to do a better job 😸	2022-12-16 14:43:09 +00:00
Bart Oldeman	60e49b851c	Fix typo in clobber list, should be xmm14 instead of ymm14.	2022-12-06 16:30:46 -05:00
Bart Oldeman	4afe1439a1	Fix skylake fallback kernel name for old compilers.	2022-12-06 16:09:54 -05:00
Bart Oldeman	5ceca1a4d8	Add sscal.c + microkernels for Haswell, Zen, Skylake and newer. Unlike [dcz]scal, sscal still used the original GotoBLAS SSE code from scal_sse.S. This code follows dscal as closely as possible, except for the inc_x > 1 code for which a plain C loop is used much like the one in cscal.c, instead of an adaptation of the SSE2 asm code of dscal.c (I tried but the performance wasn't better than the plain C loop).	2022-12-06 14:05:49 -05:00
lilianhuang	729af6406f	bugfix for sbgemm_ncopy_8_neoversen2	2022-12-05 05:10:18 -05:00
Martin Kroeker	042e3c0e7c	Merge pull request #3848 from bartoldeman/dscal-haswell-ymm dscal: use ymm registers in Haswell microkernel	2022-12-05 08:56:08 +01:00
Bart Oldeman	5c3169ecd8	dscal: use ymm registers in Haswell microkernel Using 256-bit registers in dscal makes this microkernel consistent with cscal and zscal, and generally doubles performance if the vector fits in L1 cache.	2022-12-01 07:48:05 -05:00
Chris Sidebottom	eea006a688	Wrap SVE header with __has_include check	2022-12-01 12:07:55 +00:00
Chris Sidebottom	fd4f52c797	Add SVE implementation for sdot/ddot This adds an SVE implementation to sdot/ddot when available, falling back to the previous Advanced SIMD kernel where there's no SVE implementation for the kernel. All the targets were essentially treating `dot_thunderx2t99.c` as the Advanced SIMD implementation so I've renamed it to better fit with the feature detection.	2022-12-01 12:07:50 +00:00
lilianhuang	fdac8a97c1	Add sbgemm_ncopy_8 and sbgemm_tcopy_4	2022-11-29 04:46:14 -05:00
lilianhuang	135718eafc	Improve the performance of sbgemm_tcopy on neoversen2	2022-11-28 04:17:54 -05:00
Chris Sidebottom	4f7b77e08a	Remove unnecessary instructions from Advanced SIMD dot The existing kernel was issuing extra instructions to organise the arguments into the same registers they would usually be in and similarly to put the result into the appropriate register. This has an impact on smaller sized dots and seemed like a quick fix	2022-11-25 16:19:03 +00:00

1 2 3 4 5 ...

1968 Commits