OpenBLAS

Commit Graph

Author	SHA1	Message	Date
Chris Sidebottom	a9edddb695	Unroll TN further	2024-07-18 20:04:15 +01:00
Chris Sidebottom	9984c5ce9d	Clean up k2 removal more and unroll SGEMM more	2024-07-18 18:35:43 +01:00
Chris Sidebottom	b1c9fafabb	Remove k2 loop from DGEMM TN and use a more conservative heuristic for SGEMM	2024-07-18 17:37:18 +01:00
Martin Kroeker	2020569705	fix NAN handling and make it depend on dummy2 parameter	2024-07-17 23:55:54 +02:00
Martin Kroeker	3870995f01	make NAN handling depend on dummy2 parameter	2024-07-17 23:54:24 +02:00
Martin Kroeker	7284c533b5	make NAN handling depend on dummy2 parameter	2024-07-17 23:50:40 +02:00
Martin Kroeker	73751218a4	make NAN handling depend on dummy2 parameter	2024-07-17 23:41:26 +02:00
Martin Kroeker	b9bfc8ce09	make NAN handling depend on dummy2 parameter	2024-07-17 23:29:50 +02:00
Martin Kroeker	eb4879e04c	make NAN handling depend on the dummy2 parameter	2024-07-17 23:24:19 +02:00
Martin Kroeker	ee87cb90d0	Merge pull request #4803 from iha-taisei/SVESupportSDGEMV A64FX: Add support for SVE to SGEMV/DGEMV kernels.	2024-07-17 23:14:21 +02:00
gxw	34b80ce03f	mips64: Fixed numpy CI failure	2024-07-17 10:32:22 +08:00
gxw	f6d6c14a96	mips: Fixed numpy CI failure	2024-07-17 10:31:49 +08:00
Chip Kerchner	ba47c7f4f3	Vectorize reduction stage of sgemv_t.	2024-07-16 15:57:24 -05:00
iha fujitsu	0985fdc82b	A64FX: Add support for SVE to SGEMV/DGEMV kernels.	2024-07-16 17:31:33 +09:00
Mark Ryan	67bf4b6998	Fix axpby_rvv kernels for cases where inc_y = 0 The following openblas_utest tests fail when the RISCV64_ZVL128B is enabled. TEST 89/103 axpby:zaxpby_inc_0 [FAIL] TEST 92/103 axpby:caxpby_inc_0 [FAIL] TEST 95/103 axpby:daxpby_inc_0 [FAIL] TEST 98/103 axpby:saxpby_inc_0 [FAIL] The issue is that the vectorized kernels do not work when inc_y == 0. This patch updates the kernels to fall back to the scalar algorithms when inc_y == 0, fixing the failing tests. Signed-off-by: Mark Ryan <markdryan@rivosinc.com>	2024-07-15 14:24:47 +00:00
Mark Ryan	3b715e6162	Add autodetection for riscv64 Implement DYNAMIC_ARCH support for riscv64. Three cpu types are supported, riscv64_generic, riscv64_zvl256b, riscv64_zvl128b. The two non-generic kernels require CPU support for RVV 1.0 to function correctly. Detecting that a riscv64 device supports RVV 1.0 is a little complicated as there are some boards on the market that advertise support for V via hwcap but only support RVV 0.7.1, which is not binary compatible with RVV 1.0. The approach taken is to first try hwprobe. If hwprobe is not available, we fall back to hwcap + an additional check to distinguish between RVV 1.0 and RVV 0.7.1. Tested on a VM with VLEN=256, a CanMV K230 with VLEN=128 (with only the big core enabled), a Lichee Pi with RVV 0.7.1 and a VF2 with no vector. A compiler with RVV 1.0 support must be used to build OpenBLAS for riscv64 when DYNAMIC_ARCH=1. Signed-off-by: Mark Ryan <markdryan@rivosinc.com>	2024-07-15 14:24:22 +00:00
gxw	3f39c8f94f	LoongArch: Fixed numpy CI failure	2024-07-15 11:43:08 +08:00
gxw	f3cebb3ca3	x86: Fixed numpy CI failure when the target is ZEN.	2024-07-12 16:09:30 +08:00
Martin Kroeker	5d08ec7ff3	Merge pull request #4782 from martin-frbg/azurewincl Fix NAN handling in ARM/generic SCAL; have AzureCI Windows show errors on failure	2024-07-11 23:55:15 +02:00
Chip Kerchner	cb154832f8	Vectorize SBGEMM incopy - 4x faster.	2024-07-09 13:10:03 -05:00
Martin Kroeker	a5c04e326a	Update scal.c	2024-07-04 22:28:01 +02:00
Martin Kroeker	536200bc9e	fix handling of INF or NAN	2024-07-04 17:47:19 +02:00
Martin Kroeker	3677b3886c	Merge pull request #4702 from bashimao/detect-nv-grace Correctly detect ARM Neoverse V2 CPUs.	2024-06-30 22:48:48 +02:00
Martin Kroeker	f3c364c2cc	temporarily(?) disable the alpha=0 branch as it fails to handle INF,NAN	2024-06-27 22:18:27 +02:00
Martin Kroeker	2a5fe97e3b	temporarily(?) disable the alpha=0 branch as it does not handle INF,NAN	2024-06-27 16:21:57 +02:00
Martin Kroeker	c1019d5832	Handle INF and NAN in inputs	2024-06-27 10:58:59 +02:00
Chris Sidebottom	8c472ef7e3	Further tweak small GEMM for AArch64	2024-06-24 10:47:47 +01:00
Martin Kroeker	9e24121e7e	temporarily(?) disable da=0 shortcut to handle x=Inf or NAN	2024-06-23 17:48:18 +02:00
Martin Kroeker	a11f086c17	Update sscal_msa.c	2024-06-23 12:55:19 +02:00
Martin Kroeker	541e1b6959	disable the fast path for inc=1, alpha=0 as it does not handle x=NaN or Inf	2024-06-23 10:37:55 +02:00
Martin Kroeker	c08113c279	fix special cases of x= NAN or INF	2024-06-23 01:12:33 +02:00
Martin Kroeker	bd47630bcf	exclude the alpha=0 branch as it does not handle NaN or Inf in x	2024-06-23 00:54:39 +02:00
Martin Kroeker	68f2501958	temporarily(?) disable the alpha=0 branch to handle Inf/NaN in x	2024-06-22 21:08:57 +02:00
Martin Kroeker	0a744a939a	temporarily(?) disable the alpha=0 branch to handle NaN/Inf in x	2024-06-22 21:07:43 +02:00
Martin Kroeker	7f8f037a36	handle INF and NAN in input	2024-06-22 16:03:30 +02:00
Martin Kroeker	f1248b849d	handle INF and NAN in input	2024-06-22 15:55:29 +02:00
Martin Kroeker	a2ee4b1966	Merge branch 'OpenMathLib:develop' into issue4728	2024-06-21 09:35:56 +02:00
Martin Kroeker	3ec59922b6	Add a clobber list to fix utest errors seen with gcc13 on Apple M	2024-06-20 16:19:32 +02:00
Martin Kroeker	3d8054fb16	add clobber list	2024-06-14 22:07:44 +02:00
Martin Kroeker	dd7efcf9ef	Avoid exceeding the configured thread count in x86_64 TOBF16 (#4748 ) * avoid setting nthreads higher than available	2024-06-14 14:21:13 +02:00
Martin Kroeker	6ffaf99817	disable da=0 shortcut to handle NAN and INF correctly	2024-06-07 14:46:58 +02:00
Martin Kroeker	c7cacd9b38	disable the shortcut for da=0 to ensure proper handling of INF and NAN	2024-06-07 13:48:56 +02:00
Martin Kroeker	5ed4f24d6e	Handle corner cases with INF and NAN arguments	2024-06-07 09:39:08 +02:00
Martin Kroeker	2bd43ad0eb	Merge branch 'OpenMathLib:develop' into issue4728	2024-06-07 00:37:25 +02:00
Martin Kroeker	1abafcd9b2	handle corner cases involving NAN and/or INF	2024-06-06 23:59:43 +02:00
Martin Kroeker	442dec28df	Merge pull request #4738 from martin-frbg/issue4737 Disable GEMM3M for generic targets (not implemented)	2024-06-06 17:22:38 +02:00
Martin Kroeker	2787c9f8e4	Disable GEMM3M for generic targets (not implemented)	2024-06-06 14:39:50 +02:00
gxw	af73ae6208	LoongArch: Fixed issue 4728	2024-06-06 16:43:09 +08:00
gxw	8ab2e9ec65	LoongArch: DGEMM small matrix opt	2024-06-04 16:52:45 +08:00
Martin Kroeker	83bc8d5dd8	Merge pull request #4712 from RajalakshmiSR/zscalp10 POWER: Fix issues in zscal to address lapack failures	2024-06-01 11:22:08 +02:00
Martin Kroeker	020b3e1682	fix handling of INF arguments	2024-06-01 00:51:18 +02:00
Martin Kroeker	8c05765a5a	fix other corner cases where x=INF	2024-05-31 18:06:36 +02:00
Martin Kroeker	516743f7dc	fix other instances of mishandling INF	2024-05-31 16:02:12 +02:00
Martin Kroeker	9ff4e9714e	additional fixes for handling INF arguments	2024-05-31 15:44:07 +02:00
Martin Kroeker	ce130f11d2	Update zscal.c	2024-05-31 15:09:03 +02:00
Martin Kroeker	ab13cfef93	more fixes for infinite x	2024-05-31 14:34:49 +02:00
Martin Kroeker	ad2b5c67c8	fix another corner case involving infinity	2024-05-31 01:06:58 +02:00
Bart Oldeman	62f7b244ff	Replace use of FLT_MAX in x86_64 zscal.c by isinf() Commit `def4996` fixed issues with inf and nan values in zscal, but used FLT_MAX, where DBL_MAX or isinf() is more appropriate, as FLT_MAX is for single precision only. Using FLT_MAX caused test case failures in the LAPACK tests. isinf() is consistent with the later fix `969601a1`	2024-05-24 17:20:27 +00:00
Rajalakshmi Srinivasaraghavan	e112191b54	POWER: Fix issues in zscal to address lapack failures This patch fixes following lapack failures with clang compiler on POWER. zed.out: ZVX: 18 out of 5190 tests failed to pass the threshold zgd.out: ZGV drivers: 25 out of 1092 tests failed to pass the threshold zgd.out: ZGV drivers: 6 out of 1092 tests failed to pass the threshold	2024-05-22 08:00:06 -05:00
Martin Kroeker	aa259b141d	Merge pull request #4704 from amritahs-ibm/saxpy_perf_fix Fix regression SAXPY when compiler with OpenXL compiler.	2024-05-18 19:11:25 +02:00
Matthias Langer	0050a9660b	Correctly detect ARM Neoverse V2 CPUs.	2024-05-16 09:59:52 +00:00
Chip Kerchner	3a1417671a	POWER: Fixing endianness issue in cswap/zswap kernel for AIX	2024-05-15 19:36:46 -05:00
Amrita H S	87b3d9054f	Fix regression SAXPY when compiler with OpenXL compiler. SAXPY built with OpenXL regresses when compared to SAXPY built with gcc. OpenXL compiler doesn't know that the SAXPY inner kernel assembly is a 64 element loop and to it the remainder loop is the main loop. It vectorizes and interleaves the remainder to be a 48 elements per iteration loop. With a max of 63 iterations, a 48 element loop is mostly not going to get executed, so the 1 element scalar loop that is the remainder after that is probably mostly what gets executed. This can be fixed by adding a pragma, loop interleave_count(2) which will result in 8 element loop. Signed-off-by: Amrita H S <amritahs@linux.vnet.ibm.com>	2024-05-12 23:27:55 -05:00
Martin Kroeker	8da6f7e5f2	Merge pull request #4686 from XiWeiGu/loongarch64_dgemm_kernel_16x6 Loongarch64: Improving the Performance and Stability of dgemm	2024-05-10 11:29:12 +02:00
gxw	f9a26240a7	loongarch64: Fixed icamax_lsx	2024-05-10 14:16:40 +08:00
gxw	cb0f707409	loongarch64: Fixed utest fork:safety	2024-05-10 14:16:36 +08:00
Martin Kroeker	b45d8e1ab2	remove stray comma	2024-05-09 12:33:19 +02:00
gxw	6017ad7146	loongarch64: Update dgemm_kernel_16x4 to dgemm_kernel_16x6	2024-05-08 10:10:26 +08:00
Martin Kroeker	992b71fea2	remove stray comma	2024-04-23 21:52:26 +02:00
Martin Kroeker	d421dec278	Merge pull request #4656 from zboszor/fix-x86-64-build-v2 Add forgotten conditional uses of PREFETCH	2024-04-23 21:05:08 +02:00
Martin Kroeker	ae695d4ca0	Merge pull request #4642 from XiWeiGu/loongarch64_clang CI: Add clang test for loongarch64	2024-04-23 18:25:49 +02:00
gxw	7cd438a5ac	loongarch64: Fixed clang compilation issues	2024-04-23 19:19:11 +08:00
Zoltán Böszörményi	ca64861ce8	Add forgotten conditional uses of PREFETCH This fixes a (cross-)compilation/linker error for PRESCOTT on Yocto. Signed-off-by: Zoltán Böszörményi <zoltan.boszormenyi@xenial.com>	2024-04-19 10:52:28 +02:00
gxw	9c39e969f5	mips64: Fixed MSA optimization bugs for zgemv and cgemv	2024-04-15 15:17:29 +08:00
Martin Kroeker	4c03ed437f	Fix SICORTEX ASUM/ZASUM and SUM/ZSUM for INCX <=0 (#4640 ) * Exit early if INCX <= 0	2024-04-14 15:39:11 +02:00
Martin Kroeker	7cfd433d0c	revert the C/Z NRM2 kernels to the base NEON kernel as well	2024-04-12 15:34:04 +02:00
Martin Kroeker	93d975d8fd	Merge pull request #4593 from XiWeiGu/loongarch_add_buffer_offset loongarch: Optimizing the performance of the GEMM on servers	2024-04-10 14:23:31 +02:00
gxw	d8c4ea8793	loongarch: Optimizing the performance of the GEMM on servers	2024-04-09 09:03:34 -04:00
Chen Yu	8e39c05efd	Get the l2 cache size via environment variable on confidential VM The CPUID(leaf:2 or leaf:0x80000006) is not supported on some confidential VMs. As a result the get_l2_size() returns the default 512M which brings performance issues. Introduce the environment variable OPENBLAS_L2_SIZE provided by the user to get the l2 cache size. Suggested-by: "Keshavamurthy, Anil S" <anil.s.keshavamurthy@intel.com> Signed-off-by: Chen Yu <yu.c.chen@intel.com>	2024-04-05 11:39:01 +08:00
Martin Kroeker	441c81026e	Add support for Cortex-A76	2024-04-02 19:41:44 +02:00
Martin Kroeker	9ead81bd39	Revert S/DNRM2 to the base NEON kernel to fix precision loss	2024-04-02 15:59:20 +02:00
gxw	96607cbb98	loongarch: Fixed dzamax Initialize the registers to prevent sporadic errors.	2024-03-25 23:17:53 -04:00
gxw	50869f6ca8	loongarch: Fixed zrot LSX opt	2024-03-19 10:08:11 +08:00
gxw	b5eb9d6bac	loongarch: Fixed {sc/dz}amax LSX opt	2024-03-19 09:56:11 +08:00
gxw	ad13e04669	loongarch: Fixed {s/d/sc/dz}amin LSX opt	2024-03-19 09:18:44 +08:00
gxw	bbf82cb624	loongarch: Fixed {s/d}axpby LSX opt	2024-03-18 17:51:42 +08:00
gxw	ac460eb42a	loongarch: Fixed i{c/z}amin LSX opt	2024-03-18 17:15:58 +08:00
gxw	60e251a1f8	loongarch: Fixed {sc/dz}amax LASX opt	2024-03-16 14:52:17 +08:00
gxw	a10dde5554	loongarch: Fixed {s/d/sc/dz}amin LASX opt	2024-03-16 14:52:14 +08:00
gxw	6534d378b7	loongarch: Fixed {s/d/c/z}sum LASX opt	2024-03-16 14:52:10 +08:00
gxw	6159cffc58	loongarch: Fixed i{s/c/z}amin LASX opt	2024-03-16 14:52:06 +08:00
gxw	7d755912b9	loongarch: Fixed {s/d/c/z}axpby LASX opt	2024-03-16 14:51:56 +08:00
Martin Kroeker	cf80bd8500	Update nrm2_rvv.c	2024-03-13 13:07:26 +01:00
Martin Kroeker	9baa757905	Update nrm2_vector.c	2024-03-13 11:40:14 +01:00
Martin Kroeker	18a6db6862	Update nrm2_vector.c	2024-03-13 11:10:26 +01:00
Martin Kroeker	3752e73919	handle incx < 0	2024-03-12 20:44:01 +01:00
Martin Kroeker	db70c7f7fb	handle incx < 0	2024-03-12 20:42:11 +01:00
Martin Kroeker	dee8557d58	handle incx < 0	2024-03-12 20:40:29 +01:00
Martin Kroeker	d9dff17aec	handle incx < 0	2024-03-12 20:38:23 +01:00
Martin Kroeker	552c521353	remove another early exit for incx < 0	2024-03-12 18:49:27 +01:00

1 2 3 4 5 ...

2343 Commits