OpenBLAS

Author	SHA1	Message	Date
Mark Ryan	67bf4b6998	Fix axpby_rvv kernels for cases where inc_y = 0 The following openblas_utest tests fail when the RISCV64_ZVL128B is enabled. TEST 89/103 axpby:zaxpby_inc_0 [FAIL] TEST 92/103 axpby:caxpby_inc_0 [FAIL] TEST 95/103 axpby:daxpby_inc_0 [FAIL] TEST 98/103 axpby:saxpby_inc_0 [FAIL] The issue is that the vectorized kernels do not work when inc_y == 0. This patch updates the kernels to fall back to the scalar algorithms when inc_y == 0, fixing the failing tests. Signed-off-by: Mark Ryan <markdryan@rivosinc.com>	2024-07-15 14:24:47 +00:00
Martin Kroeker	5d08ec7ff3	Merge pull request #4782 from martin-frbg/azurewincl Fix NAN handling in ARM/generic SCAL; have AzureCI Windows show errors on failure	2024-07-11 23:55:15 +02:00
Chip Kerchner	cb154832f8	Vectorize SBGEMM incopy - 4x faster.	2024-07-09 13:10:03 -05:00
Martin Kroeker	a5c04e326a	Update scal.c	2024-07-04 22:28:01 +02:00
Martin Kroeker	536200bc9e	fix handling of INF or NAN	2024-07-04 17:47:19 +02:00
Martin Kroeker	3677b3886c	Merge pull request #4702 from bashimao/detect-nv-grace Correctly detect ARM Neoverse V2 CPUs.	2024-06-30 22:48:48 +02:00
Martin Kroeker	f3c364c2cc	temporarily(?) disable the alpha=0 branch as it fails to handle INF,NAN	2024-06-27 22:18:27 +02:00
Martin Kroeker	2a5fe97e3b	temporarily(?) disable the alpha=0 branch as it does not handle INF,NAN	2024-06-27 16:21:57 +02:00
Martin Kroeker	c1019d5832	Handle INF and NAN in inputs	2024-06-27 10:58:59 +02:00
Martin Kroeker	9e24121e7e	temporarily(?) disable da=0 shortcut to handle x=Inf or NAN	2024-06-23 17:48:18 +02:00
Martin Kroeker	a11f086c17	Update sscal_msa.c	2024-06-23 12:55:19 +02:00
Martin Kroeker	541e1b6959	disable the fast path for inc=1, alpha=0 as it does not handle x=NaN or Inf	2024-06-23 10:37:55 +02:00
Martin Kroeker	c08113c279	fix special cases of x= NAN or INF	2024-06-23 01:12:33 +02:00
Martin Kroeker	bd47630bcf	exclude the alpha=0 branch as it does not handle NaN or Inf in x	2024-06-23 00:54:39 +02:00
Martin Kroeker	68f2501958	temporarily(?) disable the alpha=0 branch to handle Inf/NaN in x	2024-06-22 21:08:57 +02:00
Martin Kroeker	0a744a939a	temporarily(?) disable the alpha=0 branch to handle NaN/Inf in x	2024-06-22 21:07:43 +02:00
Martin Kroeker	7f8f037a36	handle INF and NAN in input	2024-06-22 16:03:30 +02:00
Martin Kroeker	f1248b849d	handle INF and NAN in input	2024-06-22 15:55:29 +02:00
Martin Kroeker	a2ee4b1966	Merge branch 'OpenMathLib:develop' into issue4728	2024-06-21 09:35:56 +02:00
Martin Kroeker	3ec59922b6	Add a clobber list to fix utest errors seen with gcc13 on Apple M	2024-06-20 16:19:32 +02:00
Martin Kroeker	3d8054fb16	add clobber list	2024-06-14 22:07:44 +02:00
Martin Kroeker	dd7efcf9ef	Avoid exceeding the configured thread count in x86_64 TOBF16 (#4748 ) * avoid setting nthreads higher than available	2024-06-14 14:21:13 +02:00
Martin Kroeker	6ffaf99817	disable da=0 shortcut to handle NAN and INF correctly	2024-06-07 14:46:58 +02:00
Martin Kroeker	c7cacd9b38	disable the shortcut for da=0 to ensure proper handling of INF and NAN	2024-06-07 13:48:56 +02:00
Martin Kroeker	5ed4f24d6e	Handle corner cases with INF and NAN arguments	2024-06-07 09:39:08 +02:00
Martin Kroeker	2bd43ad0eb	Merge branch 'OpenMathLib:develop' into issue4728	2024-06-07 00:37:25 +02:00
Martin Kroeker	1abafcd9b2	handle corner cases involving NAN and/or INF	2024-06-06 23:59:43 +02:00
Martin Kroeker	442dec28df	Merge pull request #4738 from martin-frbg/issue4737 Disable GEMM3M for generic targets (not implemented)	2024-06-06 17:22:38 +02:00
Martin Kroeker	2787c9f8e4	Disable GEMM3M for generic targets (not implemented)	2024-06-06 14:39:50 +02:00
gxw	af73ae6208	LoongArch: Fixed issue 4728	2024-06-06 16:43:09 +08:00
gxw	8ab2e9ec65	LoongArch: DGEMM small matrix opt	2024-06-04 16:52:45 +08:00
Martin Kroeker	83bc8d5dd8	Merge pull request #4712 from RajalakshmiSR/zscalp10 POWER: Fix issues in zscal to address lapack failures	2024-06-01 11:22:08 +02:00
Martin Kroeker	020b3e1682	fix handling of INF arguments	2024-06-01 00:51:18 +02:00
Martin Kroeker	8c05765a5a	fix other corner cases where x=INF	2024-05-31 18:06:36 +02:00
Martin Kroeker	516743f7dc	fix other instances of mishandling INF	2024-05-31 16:02:12 +02:00
Martin Kroeker	9ff4e9714e	additional fixes for handling INF arguments	2024-05-31 15:44:07 +02:00
Martin Kroeker	ce130f11d2	Update zscal.c	2024-05-31 15:09:03 +02:00
Martin Kroeker	ab13cfef93	more fixes for infinite x	2024-05-31 14:34:49 +02:00
Martin Kroeker	ad2b5c67c8	fix another corner case involving infinity	2024-05-31 01:06:58 +02:00
Bart Oldeman	62f7b244ff	Replace use of FLT_MAX in x86_64 zscal.c by isinf() Commit `def4996` fixed issues with inf and nan values in zscal, but used FLT_MAX, where DBL_MAX or isinf() is more appropriate, as FLT_MAX is for single precision only. Using FLT_MAX caused test case failures in the LAPACK tests. isinf() is consistent with the later fix `969601a1`	2024-05-24 17:20:27 +00:00
Rajalakshmi Srinivasaraghavan	e112191b54	POWER: Fix issues in zscal to address lapack failures This patch fixes following lapack failures with clang compiler on POWER. zed.out: ZVX: 18 out of 5190 tests failed to pass the threshold zgd.out: ZGV drivers: 25 out of 1092 tests failed to pass the threshold zgd.out: ZGV drivers: 6 out of 1092 tests failed to pass the threshold	2024-05-22 08:00:06 -05:00
Martin Kroeker	aa259b141d	Merge pull request #4704 from amritahs-ibm/saxpy_perf_fix Fix regression SAXPY when compiler with OpenXL compiler.	2024-05-18 19:11:25 +02:00
Matthias Langer	0050a9660b	Correctly detect ARM Neoverse V2 CPUs.	2024-05-16 09:59:52 +00:00
Chip Kerchner	3a1417671a	POWER: Fixing endianness issue in cswap/zswap kernel for AIX	2024-05-15 19:36:46 -05:00
Amrita H S	87b3d9054f	Fix regression SAXPY when compiler with OpenXL compiler. SAXPY built with OpenXL regresses when compared to SAXPY built with gcc. OpenXL compiler doesn't know that the SAXPY inner kernel assembly is a 64 element loop and to it the remainder loop is the main loop. It vectorizes and interleaves the remainder to be a 48 elements per iteration loop. With a max of 63 iterations, a 48 element loop is mostly not going to get executed, so the 1 element scalar loop that is the remainder after that is probably mostly what gets executed. This can be fixed by adding a pragma, loop interleave_count(2) which will result in 8 element loop. Signed-off-by: Amrita H S <amritahs@linux.vnet.ibm.com>	2024-05-12 23:27:55 -05:00
Martin Kroeker	8da6f7e5f2	Merge pull request #4686 from XiWeiGu/loongarch64_dgemm_kernel_16x6 Loongarch64: Improving the Performance and Stability of dgemm	2024-05-10 11:29:12 +02:00
gxw	f9a26240a7	loongarch64: Fixed icamax_lsx	2024-05-10 14:16:40 +08:00
gxw	cb0f707409	loongarch64: Fixed utest fork:safety	2024-05-10 14:16:36 +08:00
Martin Kroeker	b45d8e1ab2	remove stray comma	2024-05-09 12:33:19 +02:00
gxw	6017ad7146	loongarch64: Update dgemm_kernel_16x4 to dgemm_kernel_16x6	2024-05-08 10:10:26 +08:00

1 2 3 4 5 ...

2274 Commits