Commit Graph

2316 Commits

Author SHA1 Message Date
Martin Kroeker 46e331a917
remove the unworkable GEMM3M restriction from GENERIC again 2024-08-07 19:41:10 +02:00
Martin Kroeker ccc23338d7
have the dummy GEMM3M kernel at least forward to regular GEMM 2024-08-07 19:39:02 +02:00
Martin Kroeker f1c9803f9a
add proper return statement 2024-08-04 00:14:31 +02:00
Martin Kroeker 60abcc3991
add proper return statement 2024-08-04 00:13:31 +02:00
Martin Kroeker 9afd0c8afd
Merge pull request #4814 from Mousius/gemv-proxy
Forward GEMM to GEMV when one argument is actually a vector
2024-07-31 23:18:01 +02:00
Martin Kroeker edbf093c98
Update zarch SCAL kernels to handle INF and NAN arguments (#4829)
* handle INF and NAN in input (for S/D only if DUMMY2 argument is set)
2024-07-31 19:45:15 +02:00
Chris Sidebottom ba2e989c67 Add accumulators to AArch64 GEMV Kernels
This helps to reduce values going missing as we accumulate.
2024-07-31 13:09:14 +01:00
Martin Kroeker a875304eb0
fix inverted conditional for NAN handling 2024-07-26 09:50:20 +02:00
Martin Kroeker 24acdd6bbb
correct offset 2024-07-26 09:49:24 +02:00
Martin Kroeker fb7c53c5e5
Merge pull request #4807 from martin-frbg/scalfixes
[WIP]Make NAN handling in the SCAL kernels depend on the dummy2 parameter
2024-07-25 23:42:50 +02:00
Martin Kroeker 15c53dd2e0
Merge pull request #4794 from XiWeiGu/Fixed_Numpy_CI_Test
Try to fixed numpy ci test failures
2024-07-25 23:42:13 +02:00
Martin Kroeker a4e56e0452
Merge pull request #4806 from Mousius/small-gemm
Small GEMM for AArch64 with SVE
2024-07-25 21:50:04 +02:00
yamazaki-mitsufumi 88caf02f62 Fix ambiguous error on Mac OS 2024-07-25 22:43:13 +09:00
Martin Kroeker b613754143
Update scal..c 2024-07-24 14:31:29 +02:00
Martin Kroeker f5d04318e3
Merge branch 'OpenMathLib:develop' into scalfixes 2024-07-21 13:43:43 +02:00
Martin Kroeker 73f8866ffb
make NAN handling depend on DUMMY2 parameter 2024-07-21 13:42:47 +02:00
Martin Kroeker dfbc2348a8
fix NAN handling 2024-07-20 18:27:15 +02:00
Martin Kroeker c064319ecb
fix alpha=NAN case 2024-07-20 17:42:31 +02:00
Martin Kroeker c2ffd90e8c
make NAN handling depend on dummy2 parameter 2024-07-20 17:31:00 +02:00
Chris Sidebottom ea4ab3b310 Better header guard around bridge 2024-07-20 14:39:57 +01:00
Chris Sidebottom 7311d93016 Unroll TT further 2024-07-19 17:51:20 +01:00
Martin Kroeker a815594fd1
Merge pull request #4801 from markdryan/markdryan/riscv-dynamic-arch
Add autodetection for riscv64
2024-07-19 17:12:07 +02:00
Martin Kroeker dd6c33d34d
make NAN handling depend on dummy2 parameter 2024-07-19 16:14:55 +02:00
Hong Bo Peng db98f8753f Try to fix LAPACK testing failures on P7.
1. Remove the FADD insn from the GEMV Transpose code.
  2. Remove the FADD insn from GEMM and ZGEMM code.
  3. Reorder the compution of the Imaginary part in ZGEMM code.
2024-07-19 02:08:19 -04:00
Chris Sidebottom a9edddb695 Unroll TN further 2024-07-18 20:04:15 +01:00
Chris Sidebottom 9984c5ce9d Clean up k2 removal more and unroll SGEMM more 2024-07-18 18:35:43 +01:00
Chris Sidebottom b1c9fafabb Remove k2 loop from DGEMM TN and use a more conservative heuristic for SGEMM 2024-07-18 17:37:18 +01:00
Martin Kroeker 2020569705
fix NAN handling and make it depend on dummy2 parameter 2024-07-17 23:55:54 +02:00
Martin Kroeker 3870995f01
make NAN handling depend on dummy2 parameter 2024-07-17 23:54:24 +02:00
Martin Kroeker 7284c533b5
make NAN handling depend on dummy2 parameter 2024-07-17 23:50:40 +02:00
Martin Kroeker 73751218a4
make NAN handling depend on dummy2 parameter 2024-07-17 23:41:26 +02:00
Martin Kroeker b9bfc8ce09
make NAN handling depend on dummy2 parameter 2024-07-17 23:29:50 +02:00
Martin Kroeker eb4879e04c
make NAN handling depend on the dummy2 parameter 2024-07-17 23:24:19 +02:00
Martin Kroeker ee87cb90d0
Merge pull request #4803 from iha-taisei/SVESupportSDGEMV
A64FX: Add support for SVE to SGEMV/DGEMV kernels.
2024-07-17 23:14:21 +02:00
gxw 34b80ce03f mips64: Fixed numpy CI failure 2024-07-17 10:32:22 +08:00
gxw f6d6c14a96 mips: Fixed numpy CI failure 2024-07-17 10:31:49 +08:00
iha fujitsu 0985fdc82b A64FX: Add support for SVE to SGEMV/DGEMV kernels. 2024-07-16 17:31:33 +09:00
Mark Ryan 67bf4b6998 Fix axpby_rvv kernels for cases where inc_y = 0
The following openblas_utest tests fail when the RISCV64_ZVL128B is
enabled.

TEST 89/103 axpby:zaxpby_inc_0 [FAIL]
TEST 92/103 axpby:caxpby_inc_0 [FAIL]
TEST 95/103 axpby:daxpby_inc_0 [FAIL]
TEST 98/103 axpby:saxpby_inc_0 [FAIL]

The issue is that the vectorized kernels do not work when inc_y == 0.
This patch updates the kernels to fall back to the scalar algorithms
when inc_y == 0, fixing the failing tests.

Signed-off-by: Mark Ryan <markdryan@rivosinc.com>
2024-07-15 14:24:47 +00:00
Mark Ryan 3b715e6162 Add autodetection for riscv64
Implement DYNAMIC_ARCH support for riscv64.  Three cpu types are
supported, riscv64_generic, riscv64_zvl256b, riscv64_zvl128b.
The two non-generic kernels require CPU support for RVV 1.0 to
function correctly.  Detecting that a riscv64 device supports
RVV 1.0 is a little complicated as there are some boards on the
market that advertise support for V via hwcap but only support
RVV 0.7.1, which is not binary compatible with RVV 1.0.  The
approach taken is to first try hwprobe.  If hwprobe is not
available, we fall back to hwcap + an additional check to distinguish
between RVV 1.0 and RVV 0.7.1.

Tested on a VM with VLEN=256, a CanMV K230 with VLEN=128 (with only
the big core enabled), a Lichee Pi with RVV 0.7.1 and a VF2 with no
vector.

A compiler with RVV 1.0 support must be used to build OpenBLAS for
riscv64 when DYNAMIC_ARCH=1.

Signed-off-by: Mark Ryan <markdryan@rivosinc.com>
2024-07-15 14:24:22 +00:00
gxw 3f39c8f94f LoongArch: Fixed numpy CI failure 2024-07-15 11:43:08 +08:00
gxw f3cebb3ca3 x86: Fixed numpy CI failure when the target is ZEN. 2024-07-12 16:09:30 +08:00
Martin Kroeker 5d08ec7ff3
Merge pull request #4782 from martin-frbg/azurewincl
Fix NAN handling in ARM/generic SCAL; have AzureCI Windows show errors on failure
2024-07-11 23:55:15 +02:00
Chip Kerchner cb154832f8 Vectorize SBGEMM incopy - 4x faster. 2024-07-09 13:10:03 -05:00
Martin Kroeker a5c04e326a
Update scal.c 2024-07-04 22:28:01 +02:00
Martin Kroeker 536200bc9e
fix handling of INF or NAN 2024-07-04 17:47:19 +02:00
Martin Kroeker 3677b3886c
Merge pull request #4702 from bashimao/detect-nv-grace
Correctly detect ARM Neoverse V2 CPUs.
2024-06-30 22:48:48 +02:00
Martin Kroeker f3c364c2cc
temporarily(?) disable the alpha=0 branch as it fails to handle INF,NAN 2024-06-27 22:18:27 +02:00
Martin Kroeker 2a5fe97e3b
temporarily(?) disable the alpha=0 branch as it does not handle INF,NAN 2024-06-27 16:21:57 +02:00
Martin Kroeker c1019d5832
Handle INF and NAN in inputs 2024-06-27 10:58:59 +02:00
Chris Sidebottom 8c472ef7e3 Further tweak small GEMM for AArch64 2024-06-24 10:47:47 +01:00