OpenBLAS

Author	SHA1	Message	Date
Martin Kroeker	84bcf6639f	Disable gcc's tree-vectorizer pass on all operating systems	2023-04-20 23:24:52 +02:00
Martin Kroeker	c9174ae8d7	Disable gcc's tree-vectorizer pass on all operating systems	2023-04-19 23:45:44 +02:00
Martin Kroeker	c2fe9cb91f	Disable gcc's tree-vectorizer pass on all operating systems	2023-04-19 23:45:14 +02:00
Martin Kroeker	66b39b835c	Disable gcc's tree-vectorizer pass on all operating systems	2023-04-19 23:44:45 +02:00
Martin Kroeker	bb6d6735bf	Disable gcc's tree-vectorizer pass on all operating systems	2023-04-19 23:44:15 +02:00
Martin Kroeker	d18efaed20	Disable gcc's tree-vectorizer pass on all operating systems	2023-04-19 23:43:43 +02:00
Martin Kroeker	99f6d31ed5	Disable gcc's tree-vectorizer pass on all operating systems	2023-04-19 23:42:55 +02:00
Martin Kroeker	7de9335c56	Disable gcc's tree-vectorizer pass on all operating systems	2023-04-19 23:42:09 +02:00
Martin Kroeker	437c0bf2b4	Merge pull request #3843 from Mousius/switch-ratio Propagate SWITCH_RATIO to DYNAMIC_ARCH builds	2023-04-19 11:51:54 +02:00
Chris Sidebottom	ec334e69dc	Use SVE kernel for SGEMM/DGEMM on Arm(R) Neoverse(TM) V1 This re-spins #3869 with some additional copy unrolling which helps maintain SYRK performance. After #3868, the SVE kernels represent a pretty good boost. This re-uses ARMV8SVE as a base and I'm going to incrementally move everything to use ARMV8SVE in additional patches (as well as fix up anything that's not already in ARMV8SVE).	2023-04-17 17:38:42 +01:00
Chris Sidebottom	32f2fafde7	Propagate SWITCH_RATIO to DYNAMIC_ARCH builds Previously dynamic builds were either using the default SWITCH_RATIO or one from the higher level architecture; this patch ensures the dynamic builds can use this parameter as well.	2023-04-17 15:34:12 +01:00
Martin Kroeker	44164e3a3d	revert "move alpha out of register 18" (out of PR scope, no SVE on Apple hw)	2023-04-17 14:23:13 +02:00
Martin Kroeker	8be68fa7f4	move declaration of sca to really keep the compiler from throwing it out (for now)	2023-04-15 12:02:39 +02:00
Martin Kroeker	3727672a74	Improve workaround and keep compilers from optimizing it out	2023-04-13 18:07:52 +02:00
Martin Kroeker	108a21e47a	Move ALPHA out of register 18 (reserved on OSX)	2023-04-13 18:05:14 +02:00
Martin Kroeker	0b1acb0ba3	Move ALPHA_I out of register 18 (reserved on OSX)	2023-04-13 18:03:35 +02:00
Martin Kroeker	c7bbad09ad	Move ALPHA_I out of register 18 (reserved on OSX)	2023-04-13 18:00:47 +02:00
Martin Kroeker	cda29633a3	move ALPHA_I out of register 18 (reserved on OSX)	2023-04-13 17:59:48 +02:00
Martin Kroeker	09ace3cf23	Merge pull request #3846 from lilh9598/sbgemm_opt Improve the performance of sbgemm_tcopy on neoversen2	2023-03-26 19:04:57 +02:00
Heller Zheng	1374a2d08b	This PR adapts latest spec changes Add prefix (_riscv) for all riscv intrinsics Update some intrinsics' parameter, like vfredxxxx, vmerge	2023-03-19 23:59:03 -07:00
Zhang Xianyi	19f17c8bc6	Merge pull request #3893 from HellerZheng/develop add riscv level3 C,Z kernel functions.	2023-03-15 10:17:13 +08:00
Sergei Lewis	cb0a70e0e2	dot.c early bail fix	2023-03-02 09:51:10 +00:00
Sergei Lewis	9b61be4545	factoring riscv64/dot.c fix into separate PR as requested	2023-03-01 17:40:42 +00:00
Sergei Lewis	2406958629	* update intrinsics to match latest spec at https://github.com/riscv-non-isa/rvv-intrinsic-doc (in particular, __riscv_ prefixes for rvv intrinsics) * fix multiple numerical stability and corner case issues * add a script to generate arbitrary gemm kernel shapes * add a generic zvl256b target to demonstrate large gemm kernel unrolls	2023-02-24 10:45:03 +00:00
Martin Kroeker	38d6fb4225	Fix dependencies in builds with specified subsets of precision types	2023-02-23 23:12:06 +01:00
Martin Kroeker	e412bee313	fix GEMM kernel dependencies in builds that use only a subset of precisions	2023-02-22 00:37:14 +01:00
Martin Kroeker	d80adf253e	make SSYMV available to BUILD_DOUBLE-only builds	2023-02-22 00:30:20 +01:00
Martin Kroeker	5481c328e8	fix DYNAMIC_ARCH builds that use only a subset of precisions	2023-02-22 00:28:25 +01:00
Heller Zheng	63cf4d0166	add riscv level3 C,Z kernel functions.	2023-02-01 19:13:44 -08:00
Xianyi Zhang	c19dff0a31	Fix T-Head RVV intrinsic API changes.	2023-01-25 19:33:32 +08:00
Martin Kroeker	5a9cd87794	Merge pull request #3868 from Mousius/sve-prefetch Remove prefetches from SVE kernels	2022-12-24 10:52:29 +01:00
Chris Sidebottom	1361229291	Remove prefetches from SVE kernels This is a precursor to enabling the SVE kernels for Arm(R) Neoverse(TM) V1 which has 256-bit SVE. Testing revealed that the SVE kernel was actually worse in some cases than the existing kernel which seemed odd - removing these prefetches the underlying architecture seems to do a better job 😸	2022-12-16 14:43:09 +00:00
Bart Oldeman	60e49b851c	Fix typo in clobber list, should be xmm14 instead of ymm14.	2022-12-06 16:30:46 -05:00
Bart Oldeman	4afe1439a1	Fix skylake fallback kernel name for old compilers.	2022-12-06 16:09:54 -05:00
Bart Oldeman	5ceca1a4d8	Add sscal.c + microkernels for Haswell, Zen, Skylake and newer. Unlike [dcz]scal, sscal still used the original GotoBLAS SSE code from scal_sse.S. This code follows dscal as closely as possible, except for the inc_x > 1 code for which a plain C loop is used much like the one in cscal.c, instead of an adaptation of the SSE2 asm code of dscal.c (I tried but the performance wasn't better than the plain C loop).	2022-12-06 14:05:49 -05:00
lilianhuang	729af6406f	bugfix for sbgemm_ncopy_8_neoversen2	2022-12-05 05:10:18 -05:00
Martin Kroeker	042e3c0e7c	Merge pull request #3848 from bartoldeman/dscal-haswell-ymm dscal: use ymm registers in Haswell microkernel	2022-12-05 08:56:08 +01:00
Xianyi Zhang	e5313f53d5	Merge branch 'develop' of https://github.com/HellerZheng/OpenBLAS_riscv_x280 into HellerZheng-develop	2022-12-03 12:00:52 +08:00
Bart Oldeman	5c3169ecd8	dscal: use ymm registers in Haswell microkernel Using 256-bit registers in dscal makes this microkernel consistent with cscal and zscal, and generally doubles performance if the vector fits in L1 cache.	2022-12-01 07:48:05 -05:00
Chris Sidebottom	eea006a688	Wrap SVE header with __has_include check	2022-12-01 12:07:55 +00:00
Chris Sidebottom	fd4f52c797	Add SVE implementation for sdot/ddot This adds an SVE implementation to sdot/ddot when available, falling back to the previous Advanced SIMD kernel where there's no SVE implementation for the kernel. All the targets were essentially treating `dot_thunderx2t99.c` as the Advanced SIMD implementation so I've renamed it to better fit with the feature detection.	2022-12-01 12:07:50 +00:00
lilianhuang	fdac8a97c1	Add sbgemm_ncopy_8 and sbgemm_tcopy_4	2022-11-29 04:46:14 -05:00
lilianhuang	135718eafc	Improve the performance of sbgemm_tcopy on neoversen2	2022-11-28 04:17:54 -05:00
Chris Sidebottom	4f7b77e08a	Remove unnecessary instructions from Advanced SIMD dot The existing kernel was issuing extra instructions to organise the arguments into the same registers they would usually be in and similarly to put the result into the appropriate register. This has an impact on smaller sized dots and seemed like a quick fix	2022-11-25 16:19:03 +00:00
Heller Zheng	3918d8504e	nrm2 simple optimization	2022-11-21 19:06:07 -08:00
HellerZheng	943372bdf5	Merge branch 'develop' into develop	2022-11-18 10:12:46 +08:00
Martin Kroeker	f73cfb7e2c	change line endings from CRLF to LF	2022-11-17 09:39:56 +01:00
Martin Kroeker	1688c7da43	change line endings from CRLF to LF	2022-11-16 22:24:01 +01:00
Heller Zheng	5d0d1c5551	Remove redundant files	2022-11-15 18:22:21 -08:00
Heller Zheng	bef47917bd	Initial version for riscv sifive x280	2022-11-15 00:06:25 -08:00

... 4 5 6 7 8 ...

2216 Commits