2343 Commits

Author SHA1 Message Date
Chris Sidebottom
a9edddb695 Unroll TN further 2024-07-18 20:04:15 +01:00
Chris Sidebottom
9984c5ce9d Clean up k2 removal more and unroll SGEMM more 2024-07-18 18:35:43 +01:00
Chris Sidebottom
b1c9fafabb Remove k2 loop from DGEMM TN and use a more conservative heuristic for SGEMM 2024-07-18 17:37:18 +01:00
Martin Kroeker
2020569705 fix NAN handling and make it depend on dummy2 parameter 2024-07-17 23:55:54 +02:00
Martin Kroeker
3870995f01 make NAN handling depend on dummy2 parameter 2024-07-17 23:54:24 +02:00
Martin Kroeker
7284c533b5 make NAN handling depend on dummy2 parameter 2024-07-17 23:50:40 +02:00
Martin Kroeker
73751218a4 make NAN handling depend on dummy2 parameter 2024-07-17 23:41:26 +02:00
Martin Kroeker
b9bfc8ce09 make NAN handling depend on dummy2 parameter 2024-07-17 23:29:50 +02:00
Martin Kroeker
eb4879e04c make NAN handling depend on the dummy2 parameter 2024-07-17 23:24:19 +02:00
Martin Kroeker
ee87cb90d0 Merge pull request #4803 from iha-taisei/SVESupportSDGEMV
A64FX: Add support for SVE to SGEMV/DGEMV kernels.
2024-07-17 23:14:21 +02:00
gxw
34b80ce03f mips64: Fixed numpy CI failure 2024-07-17 10:32:22 +08:00
gxw
f6d6c14a96 mips: Fixed numpy CI failure 2024-07-17 10:31:49 +08:00
Chip Kerchner
ba47c7f4f3 Vectorize reduction stage of sgemv_t. 2024-07-16 15:57:24 -05:00
iha fujitsu
0985fdc82b A64FX: Add support for SVE to SGEMV/DGEMV kernels. 2024-07-16 17:31:33 +09:00
Mark Ryan
67bf4b6998 Fix axpby_rvv kernels for cases where inc_y = 0
The following openblas_utest tests fail when the RISCV64_ZVL128B is
enabled.

TEST 89/103 axpby:zaxpby_inc_0 [FAIL]
TEST 92/103 axpby:caxpby_inc_0 [FAIL]
TEST 95/103 axpby:daxpby_inc_0 [FAIL]
TEST 98/103 axpby:saxpby_inc_0 [FAIL]

The issue is that the vectorized kernels do not work when inc_y == 0.
This patch updates the kernels to fall back to the scalar algorithms
when inc_y == 0, fixing the failing tests.

Signed-off-by: Mark Ryan <markdryan@rivosinc.com>
2024-07-15 14:24:47 +00:00
Mark Ryan
3b715e6162 Add autodetection for riscv64
Implement DYNAMIC_ARCH support for riscv64.  Three cpu types are
supported, riscv64_generic, riscv64_zvl256b, riscv64_zvl128b.
The two non-generic kernels require CPU support for RVV 1.0 to
function correctly.  Detecting that a riscv64 device supports
RVV 1.0 is a little complicated as there are some boards on the
market that advertise support for V via hwcap but only support
RVV 0.7.1, which is not binary compatible with RVV 1.0.  The
approach taken is to first try hwprobe.  If hwprobe is not
available, we fall back to hwcap + an additional check to distinguish
between RVV 1.0 and RVV 0.7.1.

Tested on a VM with VLEN=256, a CanMV K230 with VLEN=128 (with only
the big core enabled), a Lichee Pi with RVV 0.7.1 and a VF2 with no
vector.

A compiler with RVV 1.0 support must be used to build OpenBLAS for
riscv64 when DYNAMIC_ARCH=1.

Signed-off-by: Mark Ryan <markdryan@rivosinc.com>
2024-07-15 14:24:22 +00:00
gxw
3f39c8f94f LoongArch: Fixed numpy CI failure 2024-07-15 11:43:08 +08:00
gxw
f3cebb3ca3 x86: Fixed numpy CI failure when the target is ZEN. 2024-07-12 16:09:30 +08:00
Martin Kroeker
5d08ec7ff3 Merge pull request #4782 from martin-frbg/azurewincl
Fix NAN handling in ARM/generic SCAL; have AzureCI Windows show errors on failure
2024-07-11 23:55:15 +02:00
Chip Kerchner
cb154832f8 Vectorize SBGEMM incopy - 4x faster. 2024-07-09 13:10:03 -05:00
Martin Kroeker
a5c04e326a Update scal.c 2024-07-04 22:28:01 +02:00
Martin Kroeker
536200bc9e fix handling of INF or NAN 2024-07-04 17:47:19 +02:00
Martin Kroeker
3677b3886c Merge pull request #4702 from bashimao/detect-nv-grace
Correctly detect ARM Neoverse V2 CPUs.
2024-06-30 22:48:48 +02:00
Martin Kroeker
f3c364c2cc temporarily(?) disable the alpha=0 branch as it fails to handle INF,NAN 2024-06-27 22:18:27 +02:00
Martin Kroeker
2a5fe97e3b temporarily(?) disable the alpha=0 branch as it does not handle INF,NAN 2024-06-27 16:21:57 +02:00
Martin Kroeker
c1019d5832 Handle INF and NAN in inputs 2024-06-27 10:58:59 +02:00
Chris Sidebottom
8c472ef7e3 Further tweak small GEMM for AArch64 2024-06-24 10:47:47 +01:00
Martin Kroeker
9e24121e7e temporarily(?) disable da=0 shortcut to handle x=Inf or NAN 2024-06-23 17:48:18 +02:00
Martin Kroeker
a11f086c17 Update sscal_msa.c 2024-06-23 12:55:19 +02:00
Martin Kroeker
541e1b6959 disable the fast path for inc=1, alpha=0 as it does not handle x=NaN or Inf 2024-06-23 10:37:55 +02:00
Martin Kroeker
c08113c279 fix special cases of x= NAN or INF 2024-06-23 01:12:33 +02:00
Martin Kroeker
bd47630bcf exclude the alpha=0 branch as it does not handle NaN or Inf in x 2024-06-23 00:54:39 +02:00
Martin Kroeker
68f2501958 temporarily(?) disable the alpha=0 branch to handle Inf/NaN in x 2024-06-22 21:08:57 +02:00
Martin Kroeker
0a744a939a temporarily(?) disable the alpha=0 branch to handle NaN/Inf in x 2024-06-22 21:07:43 +02:00
Martin Kroeker
7f8f037a36 handle INF and NAN in input 2024-06-22 16:03:30 +02:00
Martin Kroeker
f1248b849d handle INF and NAN in input 2024-06-22 15:55:29 +02:00
Martin Kroeker
a2ee4b1966 Merge branch 'OpenMathLib:develop' into issue4728 2024-06-21 09:35:56 +02:00
Martin Kroeker
3ec59922b6 Add a clobber list to fix utest errors seen with gcc13 on Apple M 2024-06-20 16:19:32 +02:00
Martin Kroeker
3d8054fb16 add clobber list 2024-06-14 22:07:44 +02:00
Martin Kroeker
dd7efcf9ef Avoid exceeding the configured thread count in x86_64 TOBF16 (#4748)
* avoid setting nthreads higher than available
2024-06-14 14:21:13 +02:00
Martin Kroeker
6ffaf99817 disable da=0 shortcut to handle NAN and INF correctly 2024-06-07 14:46:58 +02:00
Martin Kroeker
c7cacd9b38 disable the shortcut for da=0 to ensure proper handling of INF and NAN 2024-06-07 13:48:56 +02:00
Martin Kroeker
5ed4f24d6e Handle corner cases with INF and NAN arguments 2024-06-07 09:39:08 +02:00
Martin Kroeker
2bd43ad0eb Merge branch 'OpenMathLib:develop' into issue4728 2024-06-07 00:37:25 +02:00
Martin Kroeker
1abafcd9b2 handle corner cases involving NAN and/or INF 2024-06-06 23:59:43 +02:00
Martin Kroeker
442dec28df Merge pull request #4738 from martin-frbg/issue4737
Disable GEMM3M for generic targets (not implemented)
2024-06-06 17:22:38 +02:00
Martin Kroeker
2787c9f8e4 Disable GEMM3M for generic targets (not implemented) 2024-06-06 14:39:50 +02:00
gxw
af73ae6208 LoongArch: Fixed issue 4728 2024-06-06 16:43:09 +08:00
gxw
8ab2e9ec65 LoongArch: DGEMM small matrix opt 2024-06-04 16:52:45 +08:00
Martin Kroeker
83bc8d5dd8 Merge pull request #4712 from RajalakshmiSR/zscalp10
POWER: Fix issues in zscal to address lapack failures
2024-06-01 11:22:08 +02:00