OpenBLAS

Commit Graph

Author	SHA1	Message	Date
Martin Kroeker	a815594fd1	Merge pull request #4801 from markdryan/markdryan/riscv-dynamic-arch Add autodetection for riscv64	2024-07-19 17:12:07 +02:00
Hong Bo Peng	db98f8753f	Try to fix LAPACK testing failures on P7. 1. Remove the FADD insn from the GEMV Transpose code. 2. Remove the FADD insn from GEMM and ZGEMM code. 3. Reorder the compution of the Imaginary part in ZGEMM code.	2024-07-19 02:08:19 -04:00
Martin Kroeker	ee87cb90d0	Merge pull request #4803 from iha-taisei/SVESupportSDGEMV A64FX: Add support for SVE to SGEMV/DGEMV kernels.	2024-07-17 23:14:21 +02:00
iha fujitsu	0985fdc82b	A64FX: Add support for SVE to SGEMV/DGEMV kernels.	2024-07-16 17:31:33 +09:00
Mark Ryan	67bf4b6998	Fix axpby_rvv kernels for cases where inc_y = 0 The following openblas_utest tests fail when the RISCV64_ZVL128B is enabled. TEST 89/103 axpby:zaxpby_inc_0 [FAIL] TEST 92/103 axpby:caxpby_inc_0 [FAIL] TEST 95/103 axpby:daxpby_inc_0 [FAIL] TEST 98/103 axpby:saxpby_inc_0 [FAIL] The issue is that the vectorized kernels do not work when inc_y == 0. This patch updates the kernels to fall back to the scalar algorithms when inc_y == 0, fixing the failing tests. Signed-off-by: Mark Ryan <markdryan@rivosinc.com>	2024-07-15 14:24:47 +00:00
Mark Ryan	3b715e6162	Add autodetection for riscv64 Implement DYNAMIC_ARCH support for riscv64. Three cpu types are supported, riscv64_generic, riscv64_zvl256b, riscv64_zvl128b. The two non-generic kernels require CPU support for RVV 1.0 to function correctly. Detecting that a riscv64 device supports RVV 1.0 is a little complicated as there are some boards on the market that advertise support for V via hwcap but only support RVV 0.7.1, which is not binary compatible with RVV 1.0. The approach taken is to first try hwprobe. If hwprobe is not available, we fall back to hwcap + an additional check to distinguish between RVV 1.0 and RVV 0.7.1. Tested on a VM with VLEN=256, a CanMV K230 with VLEN=128 (with only the big core enabled), a Lichee Pi with RVV 0.7.1 and a VF2 with no vector. A compiler with RVV 1.0 support must be used to build OpenBLAS for riscv64 when DYNAMIC_ARCH=1. Signed-off-by: Mark Ryan <markdryan@rivosinc.com>	2024-07-15 14:24:22 +00:00
Martin Kroeker	5d08ec7ff3	Merge pull request #4782 from martin-frbg/azurewincl Fix NAN handling in ARM/generic SCAL; have AzureCI Windows show errors on failure	2024-07-11 23:55:15 +02:00
Chip Kerchner	cb154832f8	Vectorize SBGEMM incopy - 4x faster.	2024-07-09 13:10:03 -05:00
Martin Kroeker	a5c04e326a	Update scal.c	2024-07-04 22:28:01 +02:00
Martin Kroeker	536200bc9e	fix handling of INF or NAN	2024-07-04 17:47:19 +02:00
Martin Kroeker	3677b3886c	Merge pull request #4702 from bashimao/detect-nv-grace Correctly detect ARM Neoverse V2 CPUs.	2024-06-30 22:48:48 +02:00
Martin Kroeker	f3c364c2cc	temporarily(?) disable the alpha=0 branch as it fails to handle INF,NAN	2024-06-27 22:18:27 +02:00
Martin Kroeker	2a5fe97e3b	temporarily(?) disable the alpha=0 branch as it does not handle INF,NAN	2024-06-27 16:21:57 +02:00
Martin Kroeker	c1019d5832	Handle INF and NAN in inputs	2024-06-27 10:58:59 +02:00
Martin Kroeker	9e24121e7e	temporarily(?) disable da=0 shortcut to handle x=Inf or NAN	2024-06-23 17:48:18 +02:00
Martin Kroeker	a11f086c17	Update sscal_msa.c	2024-06-23 12:55:19 +02:00
Martin Kroeker	541e1b6959	disable the fast path for inc=1, alpha=0 as it does not handle x=NaN or Inf	2024-06-23 10:37:55 +02:00
Martin Kroeker	c08113c279	fix special cases of x= NAN or INF	2024-06-23 01:12:33 +02:00
Martin Kroeker	bd47630bcf	exclude the alpha=0 branch as it does not handle NaN or Inf in x	2024-06-23 00:54:39 +02:00
Martin Kroeker	68f2501958	temporarily(?) disable the alpha=0 branch to handle Inf/NaN in x	2024-06-22 21:08:57 +02:00
Martin Kroeker	0a744a939a	temporarily(?) disable the alpha=0 branch to handle NaN/Inf in x	2024-06-22 21:07:43 +02:00
Martin Kroeker	7f8f037a36	handle INF and NAN in input	2024-06-22 16:03:30 +02:00
Martin Kroeker	f1248b849d	handle INF and NAN in input	2024-06-22 15:55:29 +02:00
Martin Kroeker	a2ee4b1966	Merge branch 'OpenMathLib:develop' into issue4728	2024-06-21 09:35:56 +02:00
Martin Kroeker	3ec59922b6	Add a clobber list to fix utest errors seen with gcc13 on Apple M	2024-06-20 16:19:32 +02:00
Martin Kroeker	3d8054fb16	add clobber list	2024-06-14 22:07:44 +02:00
Martin Kroeker	dd7efcf9ef	Avoid exceeding the configured thread count in x86_64 TOBF16 (#4748 ) * avoid setting nthreads higher than available	2024-06-14 14:21:13 +02:00
Martin Kroeker	6ffaf99817	disable da=0 shortcut to handle NAN and INF correctly	2024-06-07 14:46:58 +02:00
Martin Kroeker	c7cacd9b38	disable the shortcut for da=0 to ensure proper handling of INF and NAN	2024-06-07 13:48:56 +02:00
Martin Kroeker	5ed4f24d6e	Handle corner cases with INF and NAN arguments	2024-06-07 09:39:08 +02:00
Martin Kroeker	2bd43ad0eb	Merge branch 'OpenMathLib:develop' into issue4728	2024-06-07 00:37:25 +02:00
Martin Kroeker	1abafcd9b2	handle corner cases involving NAN and/or INF	2024-06-06 23:59:43 +02:00
Martin Kroeker	442dec28df	Merge pull request #4738 from martin-frbg/issue4737 Disable GEMM3M for generic targets (not implemented)	2024-06-06 17:22:38 +02:00
Martin Kroeker	2787c9f8e4	Disable GEMM3M for generic targets (not implemented)	2024-06-06 14:39:50 +02:00
gxw	af73ae6208	LoongArch: Fixed issue 4728	2024-06-06 16:43:09 +08:00
gxw	8ab2e9ec65	LoongArch: DGEMM small matrix opt	2024-06-04 16:52:45 +08:00
Martin Kroeker	83bc8d5dd8	Merge pull request #4712 from RajalakshmiSR/zscalp10 POWER: Fix issues in zscal to address lapack failures	2024-06-01 11:22:08 +02:00
Martin Kroeker	020b3e1682	fix handling of INF arguments	2024-06-01 00:51:18 +02:00
Martin Kroeker	8c05765a5a	fix other corner cases where x=INF	2024-05-31 18:06:36 +02:00
Martin Kroeker	516743f7dc	fix other instances of mishandling INF	2024-05-31 16:02:12 +02:00
Martin Kroeker	9ff4e9714e	additional fixes for handling INF arguments	2024-05-31 15:44:07 +02:00
Martin Kroeker	ce130f11d2	Update zscal.c	2024-05-31 15:09:03 +02:00
Martin Kroeker	ab13cfef93	more fixes for infinite x	2024-05-31 14:34:49 +02:00
Martin Kroeker	ad2b5c67c8	fix another corner case involving infinity	2024-05-31 01:06:58 +02:00
Bart Oldeman	62f7b244ff	Replace use of FLT_MAX in x86_64 zscal.c by isinf() Commit `def4996` fixed issues with inf and nan values in zscal, but used FLT_MAX, where DBL_MAX or isinf() is more appropriate, as FLT_MAX is for single precision only. Using FLT_MAX caused test case failures in the LAPACK tests. isinf() is consistent with the later fix `969601a1`	2024-05-24 17:20:27 +00:00
Rajalakshmi Srinivasaraghavan	e112191b54	POWER: Fix issues in zscal to address lapack failures This patch fixes following lapack failures with clang compiler on POWER. zed.out: ZVX: 18 out of 5190 tests failed to pass the threshold zgd.out: ZGV drivers: 25 out of 1092 tests failed to pass the threshold zgd.out: ZGV drivers: 6 out of 1092 tests failed to pass the threshold	2024-05-22 08:00:06 -05:00
Martin Kroeker	aa259b141d	Merge pull request #4704 from amritahs-ibm/saxpy_perf_fix Fix regression SAXPY when compiler with OpenXL compiler.	2024-05-18 19:11:25 +02:00
Matthias Langer	0050a9660b	Correctly detect ARM Neoverse V2 CPUs.	2024-05-16 09:59:52 +00:00
Chip Kerchner	3a1417671a	POWER: Fixing endianness issue in cswap/zswap kernel for AIX	2024-05-15 19:36:46 -05:00
Amrita H S	87b3d9054f	Fix regression SAXPY when compiler with OpenXL compiler. SAXPY built with OpenXL regresses when compared to SAXPY built with gcc. OpenXL compiler doesn't know that the SAXPY inner kernel assembly is a 64 element loop and to it the remainder loop is the main loop. It vectorizes and interleaves the remainder to be a 48 elements per iteration loop. With a max of 63 iterations, a 48 element loop is mostly not going to get executed, so the 1 element scalar loop that is the remainder after that is probably mostly what gets executed. This can be fixed by adding a pragma, loop interleave_count(2) which will result in 8 element loop. Signed-off-by: Amrita H S <amritahs@linux.vnet.ibm.com>	2024-05-12 23:27:55 -05:00

1 2 3 4 5 ...

2279 Commits