Chris Sidebottom
a9edddb695
Unroll TN further
2024-07-18 20:04:15 +01:00
Chris Sidebottom
9984c5ce9d
Clean up k2 removal more and unroll SGEMM more
2024-07-18 18:35:43 +01:00
Chris Sidebottom
b1c9fafabb
Remove k2 loop from DGEMM TN and use a more conservative heuristic for SGEMM
2024-07-18 17:37:18 +01:00
Martin Kroeker
2020569705
fix NAN handling and make it depend on dummy2 parameter
2024-07-17 23:55:54 +02:00
Martin Kroeker
3870995f01
make NAN handling depend on dummy2 parameter
2024-07-17 23:54:24 +02:00
Martin Kroeker
7284c533b5
make NAN handling depend on dummy2 parameter
2024-07-17 23:50:40 +02:00
Martin Kroeker
73751218a4
make NAN handling depend on dummy2 parameter
2024-07-17 23:41:26 +02:00
Martin Kroeker
b9bfc8ce09
make NAN handling depend on dummy2 parameter
2024-07-17 23:29:50 +02:00
Martin Kroeker
eb4879e04c
make NAN handling depend on the dummy2 parameter
2024-07-17 23:24:19 +02:00
Martin Kroeker
ee87cb90d0
Merge pull request #4803 from iha-taisei/SVESupportSDGEMV
...
A64FX: Add support for SVE to SGEMV/DGEMV kernels.
2024-07-17 23:14:21 +02:00
gxw
34b80ce03f
mips64: Fixed numpy CI failure
2024-07-17 10:32:22 +08:00
gxw
f6d6c14a96
mips: Fixed numpy CI failure
2024-07-17 10:31:49 +08:00
Chip Kerchner
ba47c7f4f3
Vectorize reduction stage of sgemv_t.
2024-07-16 15:57:24 -05:00
iha fujitsu
0985fdc82b
A64FX: Add support for SVE to SGEMV/DGEMV kernels.
2024-07-16 17:31:33 +09:00
Mark Ryan
67bf4b6998
Fix axpby_rvv kernels for cases where inc_y = 0
...
The following openblas_utest tests fail when the RISCV64_ZVL128B is
enabled.
TEST 89/103 axpby:zaxpby_inc_0 [FAIL]
TEST 92/103 axpby:caxpby_inc_0 [FAIL]
TEST 95/103 axpby:daxpby_inc_0 [FAIL]
TEST 98/103 axpby:saxpby_inc_0 [FAIL]
The issue is that the vectorized kernels do not work when inc_y == 0.
This patch updates the kernels to fall back to the scalar algorithms
when inc_y == 0, fixing the failing tests.
Signed-off-by: Mark Ryan <markdryan@rivosinc.com>
2024-07-15 14:24:47 +00:00
Mark Ryan
3b715e6162
Add autodetection for riscv64
...
Implement DYNAMIC_ARCH support for riscv64. Three cpu types are
supported, riscv64_generic, riscv64_zvl256b, riscv64_zvl128b.
The two non-generic kernels require CPU support for RVV 1.0 to
function correctly. Detecting that a riscv64 device supports
RVV 1.0 is a little complicated as there are some boards on the
market that advertise support for V via hwcap but only support
RVV 0.7.1, which is not binary compatible with RVV 1.0. The
approach taken is to first try hwprobe. If hwprobe is not
available, we fall back to hwcap + an additional check to distinguish
between RVV 1.0 and RVV 0.7.1.
Tested on a VM with VLEN=256, a CanMV K230 with VLEN=128 (with only
the big core enabled), a Lichee Pi with RVV 0.7.1 and a VF2 with no
vector.
A compiler with RVV 1.0 support must be used to build OpenBLAS for
riscv64 when DYNAMIC_ARCH=1.
Signed-off-by: Mark Ryan <markdryan@rivosinc.com>
2024-07-15 14:24:22 +00:00
gxw
3f39c8f94f
LoongArch: Fixed numpy CI failure
2024-07-15 11:43:08 +08:00
gxw
f3cebb3ca3
x86: Fixed numpy CI failure when the target is ZEN.
2024-07-12 16:09:30 +08:00
Martin Kroeker
5d08ec7ff3
Merge pull request #4782 from martin-frbg/azurewincl
...
Fix NAN handling in ARM/generic SCAL; have AzureCI Windows show errors on failure
2024-07-11 23:55:15 +02:00
Chip Kerchner
cb154832f8
Vectorize SBGEMM incopy - 4x faster.
2024-07-09 13:10:03 -05:00
Martin Kroeker
a5c04e326a
Update scal.c
2024-07-04 22:28:01 +02:00
Martin Kroeker
536200bc9e
fix handling of INF or NAN
2024-07-04 17:47:19 +02:00
Martin Kroeker
3677b3886c
Merge pull request #4702 from bashimao/detect-nv-grace
...
Correctly detect ARM Neoverse V2 CPUs.
2024-06-30 22:48:48 +02:00
Martin Kroeker
f3c364c2cc
temporarily(?) disable the alpha=0 branch as it fails to handle INF,NAN
2024-06-27 22:18:27 +02:00
Martin Kroeker
2a5fe97e3b
temporarily(?) disable the alpha=0 branch as it does not handle INF,NAN
2024-06-27 16:21:57 +02:00
Martin Kroeker
c1019d5832
Handle INF and NAN in inputs
2024-06-27 10:58:59 +02:00
Chris Sidebottom
8c472ef7e3
Further tweak small GEMM for AArch64
2024-06-24 10:47:47 +01:00
Martin Kroeker
9e24121e7e
temporarily(?) disable da=0 shortcut to handle x=Inf or NAN
2024-06-23 17:48:18 +02:00
Martin Kroeker
a11f086c17
Update sscal_msa.c
2024-06-23 12:55:19 +02:00
Martin Kroeker
541e1b6959
disable the fast path for inc=1, alpha=0 as it does not handle x=NaN or Inf
2024-06-23 10:37:55 +02:00
Martin Kroeker
c08113c279
fix special cases of x= NAN or INF
2024-06-23 01:12:33 +02:00
Martin Kroeker
bd47630bcf
exclude the alpha=0 branch as it does not handle NaN or Inf in x
2024-06-23 00:54:39 +02:00
Martin Kroeker
68f2501958
temporarily(?) disable the alpha=0 branch to handle Inf/NaN in x
2024-06-22 21:08:57 +02:00
Martin Kroeker
0a744a939a
temporarily(?) disable the alpha=0 branch to handle NaN/Inf in x
2024-06-22 21:07:43 +02:00
Martin Kroeker
7f8f037a36
handle INF and NAN in input
2024-06-22 16:03:30 +02:00
Martin Kroeker
f1248b849d
handle INF and NAN in input
2024-06-22 15:55:29 +02:00
Martin Kroeker
a2ee4b1966
Merge branch 'OpenMathLib:develop' into issue4728
2024-06-21 09:35:56 +02:00
Martin Kroeker
3ec59922b6
Add a clobber list to fix utest errors seen with gcc13 on Apple M
2024-06-20 16:19:32 +02:00
Martin Kroeker
3d8054fb16
add clobber list
2024-06-14 22:07:44 +02:00
Martin Kroeker
dd7efcf9ef
Avoid exceeding the configured thread count in x86_64 TOBF16 ( #4748 )
...
* avoid setting nthreads higher than available
2024-06-14 14:21:13 +02:00
Martin Kroeker
6ffaf99817
disable da=0 shortcut to handle NAN and INF correctly
2024-06-07 14:46:58 +02:00
Martin Kroeker
c7cacd9b38
disable the shortcut for da=0 to ensure proper handling of INF and NAN
2024-06-07 13:48:56 +02:00
Martin Kroeker
5ed4f24d6e
Handle corner cases with INF and NAN arguments
2024-06-07 09:39:08 +02:00
Martin Kroeker
2bd43ad0eb
Merge branch 'OpenMathLib:develop' into issue4728
2024-06-07 00:37:25 +02:00
Martin Kroeker
1abafcd9b2
handle corner cases involving NAN and/or INF
2024-06-06 23:59:43 +02:00
Martin Kroeker
442dec28df
Merge pull request #4738 from martin-frbg/issue4737
...
Disable GEMM3M for generic targets (not implemented)
2024-06-06 17:22:38 +02:00
Martin Kroeker
2787c9f8e4
Disable GEMM3M for generic targets (not implemented)
2024-06-06 14:39:50 +02:00
gxw
af73ae6208
LoongArch: Fixed issue 4728
2024-06-06 16:43:09 +08:00
gxw
8ab2e9ec65
LoongArch: DGEMM small matrix opt
2024-06-04 16:52:45 +08:00
Martin Kroeker
83bc8d5dd8
Merge pull request #4712 from RajalakshmiSR/zscalp10
...
POWER: Fix issues in zscal to address lapack failures
2024-06-01 11:22:08 +02:00
Martin Kroeker
020b3e1682
fix handling of INF arguments
2024-06-01 00:51:18 +02:00
Martin Kroeker
8c05765a5a
fix other corner cases where x=INF
2024-05-31 18:06:36 +02:00
Martin Kroeker
516743f7dc
fix other instances of mishandling INF
2024-05-31 16:02:12 +02:00
Martin Kroeker
9ff4e9714e
additional fixes for handling INF arguments
2024-05-31 15:44:07 +02:00
Martin Kroeker
ce130f11d2
Update zscal.c
2024-05-31 15:09:03 +02:00
Martin Kroeker
ab13cfef93
more fixes for infinite x
2024-05-31 14:34:49 +02:00
Martin Kroeker
ad2b5c67c8
fix another corner case involving infinity
2024-05-31 01:06:58 +02:00
Bart Oldeman
62f7b244ff
Replace use of FLT_MAX in x86_64 zscal.c by isinf()
...
Commit def4996
fixed issues with inf and nan values in zscal,
but used FLT_MAX, where DBL_MAX or isinf() is more appropriate,
as FLT_MAX is for single precision only.
Using FLT_MAX caused test case failures in the LAPACK tests.
isinf() is consistent with the later fix 969601a1
2024-05-24 17:20:27 +00:00
Rajalakshmi Srinivasaraghavan
e112191b54
POWER: Fix issues in zscal to address lapack failures
...
This patch fixes following lapack failures with clang compiler on POWER.
zed.out: ZVX: 18 out of 5190 tests failed to pass the threshold
zgd.out: ZGV drivers: 25 out of 1092 tests failed to pass the threshold
zgd.out: ZGV drivers: 6 out of 1092 tests failed to pass the threshold
2024-05-22 08:00:06 -05:00
Martin Kroeker
aa259b141d
Merge pull request #4704 from amritahs-ibm/saxpy_perf_fix
...
Fix regression SAXPY when compiler with OpenXL compiler.
2024-05-18 19:11:25 +02:00
Matthias Langer
0050a9660b
Correctly detect ARM Neoverse V2 CPUs.
2024-05-16 09:59:52 +00:00
Chip Kerchner
3a1417671a
POWER: Fixing endianness issue in cswap/zswap kernel for AIX
2024-05-15 19:36:46 -05:00
Amrita H S
87b3d9054f
Fix regression SAXPY when compiler with OpenXL compiler.
...
SAXPY built with OpenXL regresses when compared to SAXPY
built with gcc. OpenXL compiler doesn't know that the
SAXPY inner kernel assembly is a 64 element loop and
to it the remainder loop is the main loop. It vectorizes
and interleaves the remainder to be a 48 elements per
iteration loop. With a max of 63 iterations, a 48 element
loop is mostly not going to get executed, so the 1 element
scalar loop that is the remainder after that is probably
mostly what gets executed.
This can be fixed by adding a pragma, loop interleave_count(2)
which will result in 8 element loop.
Signed-off-by: Amrita H S <amritahs@linux.vnet.ibm.com>
2024-05-12 23:27:55 -05:00
Martin Kroeker
8da6f7e5f2
Merge pull request #4686 from XiWeiGu/loongarch64_dgemm_kernel_16x6
...
Loongarch64: Improving the Performance and Stability of dgemm
2024-05-10 11:29:12 +02:00
gxw
f9a26240a7
loongarch64: Fixed icamax_lsx
2024-05-10 14:16:40 +08:00
gxw
cb0f707409
loongarch64: Fixed utest fork:safety
2024-05-10 14:16:36 +08:00
Martin Kroeker
b45d8e1ab2
remove stray comma
2024-05-09 12:33:19 +02:00
gxw
6017ad7146
loongarch64: Update dgemm_kernel_16x4 to dgemm_kernel_16x6
2024-05-08 10:10:26 +08:00
Martin Kroeker
992b71fea2
remove stray comma
2024-04-23 21:52:26 +02:00
Martin Kroeker
d421dec278
Merge pull request #4656 from zboszor/fix-x86-64-build-v2
...
Add forgotten conditional uses of PREFETCH
2024-04-23 21:05:08 +02:00
Martin Kroeker
ae695d4ca0
Merge pull request #4642 from XiWeiGu/loongarch64_clang
...
CI: Add clang test for loongarch64
2024-04-23 18:25:49 +02:00
gxw
7cd438a5ac
loongarch64: Fixed clang compilation issues
2024-04-23 19:19:11 +08:00
Zoltán Böszörményi
ca64861ce8
Add forgotten conditional uses of PREFETCH
...
This fixes a (cross-)compilation/linker error for PRESCOTT
on Yocto.
Signed-off-by: Zoltán Böszörményi <zoltan.boszormenyi@xenial.com>
2024-04-19 10:52:28 +02:00
gxw
9c39e969f5
mips64: Fixed MSA optimization bugs for zgemv and cgemv
2024-04-15 15:17:29 +08:00
Martin Kroeker
4c03ed437f
Fix SICORTEX ASUM/ZASUM and SUM/ZSUM for INCX <=0 ( #4640 )
...
* Exit early if INCX <= 0
2024-04-14 15:39:11 +02:00
Martin Kroeker
7cfd433d0c
revert the C/Z NRM2 kernels to the base NEON kernel as well
2024-04-12 15:34:04 +02:00
Martin Kroeker
93d975d8fd
Merge pull request #4593 from XiWeiGu/loongarch_add_buffer_offset
...
loongarch: Optimizing the performance of the GEMM on servers
2024-04-10 14:23:31 +02:00
gxw
d8c4ea8793
loongarch: Optimizing the performance of the GEMM on servers
2024-04-09 09:03:34 -04:00
Chen Yu
8e39c05efd
Get the l2 cache size via environment variable on confidential VM
...
The CPUID(leaf:2 or leaf:0x80000006) is not supported on some confidential
VMs. As a result the get_l2_size() returns the default 512M which brings
performance issues.
Introduce the environment variable OPENBLAS_L2_SIZE provided by the user
to get the l2 cache size.
Suggested-by: "Keshavamurthy, Anil S" <anil.s.keshavamurthy@intel.com>
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
2024-04-05 11:39:01 +08:00
Martin Kroeker
441c81026e
Add support for Cortex-A76
2024-04-02 19:41:44 +02:00
Martin Kroeker
9ead81bd39
Revert S/DNRM2 to the base NEON kernel to fix precision loss
2024-04-02 15:59:20 +02:00
gxw
96607cbb98
loongarch: Fixed dzamax
...
Initialize the registers to prevent sporadic errors.
2024-03-25 23:17:53 -04:00
gxw
50869f6ca8
loongarch: Fixed zrot LSX opt
2024-03-19 10:08:11 +08:00
gxw
b5eb9d6bac
loongarch: Fixed {sc/dz}amax LSX opt
2024-03-19 09:56:11 +08:00
gxw
ad13e04669
loongarch: Fixed {s/d/sc/dz}amin LSX opt
2024-03-19 09:18:44 +08:00
gxw
bbf82cb624
loongarch: Fixed {s/d}axpby LSX opt
2024-03-18 17:51:42 +08:00
gxw
ac460eb42a
loongarch: Fixed i{c/z}amin LSX opt
2024-03-18 17:15:58 +08:00
gxw
60e251a1f8
loongarch: Fixed {sc/dz}amax LASX opt
2024-03-16 14:52:17 +08:00
gxw
a10dde5554
loongarch: Fixed {s/d/sc/dz}amin LASX opt
2024-03-16 14:52:14 +08:00
gxw
6534d378b7
loongarch: Fixed {s/d/c/z}sum LASX opt
2024-03-16 14:52:10 +08:00
gxw
6159cffc58
loongarch: Fixed i{s/c/z}amin LASX opt
2024-03-16 14:52:06 +08:00
gxw
7d755912b9
loongarch: Fixed {s/d/c/z}axpby LASX opt
2024-03-16 14:51:56 +08:00
Martin Kroeker
cf80bd8500
Update nrm2_rvv.c
2024-03-13 13:07:26 +01:00
Martin Kroeker
9baa757905
Update nrm2_vector.c
2024-03-13 11:40:14 +01:00
Martin Kroeker
18a6db6862
Update nrm2_vector.c
2024-03-13 11:10:26 +01:00
Martin Kroeker
3752e73919
handle incx < 0
2024-03-12 20:44:01 +01:00
Martin Kroeker
db70c7f7fb
handle incx < 0
2024-03-12 20:42:11 +01:00
Martin Kroeker
dee8557d58
handle incx < 0
2024-03-12 20:40:29 +01:00
Martin Kroeker
d9dff17aec
handle incx < 0
2024-03-12 20:38:23 +01:00
Martin Kroeker
552c521353
remove another early exit for incx < 0
2024-03-12 18:49:27 +01:00