Commit Graph

2284 Commits

Author SHA1 Message Date
Martin Kroeker
c2ffd90e8c make NAN handling depend on dummy2 parameter 2024-07-20 17:31:00 +02:00
Martin Kroeker
dd6c33d34d make NAN handling depend on dummy2 parameter 2024-07-19 16:14:55 +02:00
Martin Kroeker
2020569705 fix NAN handling and make it depend on dummy2 parameter 2024-07-17 23:55:54 +02:00
Martin Kroeker
3870995f01 make NAN handling depend on dummy2 parameter 2024-07-17 23:54:24 +02:00
Martin Kroeker
7284c533b5 make NAN handling depend on dummy2 parameter 2024-07-17 23:50:40 +02:00
Martin Kroeker
73751218a4 make NAN handling depend on dummy2 parameter 2024-07-17 23:41:26 +02:00
Martin Kroeker
b9bfc8ce09 make NAN handling depend on dummy2 parameter 2024-07-17 23:29:50 +02:00
Martin Kroeker
eb4879e04c make NAN handling depend on the dummy2 parameter 2024-07-17 23:24:19 +02:00
Martin Kroeker
ee87cb90d0 Merge pull request #4803 from iha-taisei/SVESupportSDGEMV
A64FX: Add support for SVE to SGEMV/DGEMV kernels.
2024-07-17 23:14:21 +02:00
iha fujitsu
0985fdc82b A64FX: Add support for SVE to SGEMV/DGEMV kernels. 2024-07-16 17:31:33 +09:00
Mark Ryan
67bf4b6998 Fix axpby_rvv kernels for cases where inc_y = 0
The following openblas_utest tests fail when the RISCV64_ZVL128B is
enabled.

TEST 89/103 axpby:zaxpby_inc_0 [FAIL]
TEST 92/103 axpby:caxpby_inc_0 [FAIL]
TEST 95/103 axpby:daxpby_inc_0 [FAIL]
TEST 98/103 axpby:saxpby_inc_0 [FAIL]

The issue is that the vectorized kernels do not work when inc_y == 0.
This patch updates the kernels to fall back to the scalar algorithms
when inc_y == 0, fixing the failing tests.

Signed-off-by: Mark Ryan <markdryan@rivosinc.com>
2024-07-15 14:24:47 +00:00
Martin Kroeker
5d08ec7ff3 Merge pull request #4782 from martin-frbg/azurewincl
Fix NAN handling in ARM/generic SCAL; have AzureCI Windows show errors on failure
2024-07-11 23:55:15 +02:00
Chip Kerchner
cb154832f8 Vectorize SBGEMM incopy - 4x faster. 2024-07-09 13:10:03 -05:00
Martin Kroeker
a5c04e326a Update scal.c 2024-07-04 22:28:01 +02:00
Martin Kroeker
536200bc9e fix handling of INF or NAN 2024-07-04 17:47:19 +02:00
Martin Kroeker
3677b3886c Merge pull request #4702 from bashimao/detect-nv-grace
Correctly detect ARM Neoverse V2 CPUs.
2024-06-30 22:48:48 +02:00
Martin Kroeker
f3c364c2cc temporarily(?) disable the alpha=0 branch as it fails to handle INF,NAN 2024-06-27 22:18:27 +02:00
Martin Kroeker
2a5fe97e3b temporarily(?) disable the alpha=0 branch as it does not handle INF,NAN 2024-06-27 16:21:57 +02:00
Martin Kroeker
c1019d5832 Handle INF and NAN in inputs 2024-06-27 10:58:59 +02:00
Martin Kroeker
9e24121e7e temporarily(?) disable da=0 shortcut to handle x=Inf or NAN 2024-06-23 17:48:18 +02:00
Martin Kroeker
a11f086c17 Update sscal_msa.c 2024-06-23 12:55:19 +02:00
Martin Kroeker
541e1b6959 disable the fast path for inc=1, alpha=0 as it does not handle x=NaN or Inf 2024-06-23 10:37:55 +02:00
Martin Kroeker
c08113c279 fix special cases of x= NAN or INF 2024-06-23 01:12:33 +02:00
Martin Kroeker
bd47630bcf exclude the alpha=0 branch as it does not handle NaN or Inf in x 2024-06-23 00:54:39 +02:00
Martin Kroeker
68f2501958 temporarily(?) disable the alpha=0 branch to handle Inf/NaN in x 2024-06-22 21:08:57 +02:00
Martin Kroeker
0a744a939a temporarily(?) disable the alpha=0 branch to handle NaN/Inf in x 2024-06-22 21:07:43 +02:00
Martin Kroeker
7f8f037a36 handle INF and NAN in input 2024-06-22 16:03:30 +02:00
Martin Kroeker
f1248b849d handle INF and NAN in input 2024-06-22 15:55:29 +02:00
Martin Kroeker
a2ee4b1966 Merge branch 'OpenMathLib:develop' into issue4728 2024-06-21 09:35:56 +02:00
Martin Kroeker
3ec59922b6 Add a clobber list to fix utest errors seen with gcc13 on Apple M 2024-06-20 16:19:32 +02:00
Martin Kroeker
3d8054fb16 add clobber list 2024-06-14 22:07:44 +02:00
Martin Kroeker
dd7efcf9ef Avoid exceeding the configured thread count in x86_64 TOBF16 (#4748)
* avoid setting nthreads higher than available
2024-06-14 14:21:13 +02:00
Martin Kroeker
6ffaf99817 disable da=0 shortcut to handle NAN and INF correctly 2024-06-07 14:46:58 +02:00
Martin Kroeker
c7cacd9b38 disable the shortcut for da=0 to ensure proper handling of INF and NAN 2024-06-07 13:48:56 +02:00
Martin Kroeker
5ed4f24d6e Handle corner cases with INF and NAN arguments 2024-06-07 09:39:08 +02:00
Martin Kroeker
2bd43ad0eb Merge branch 'OpenMathLib:develop' into issue4728 2024-06-07 00:37:25 +02:00
Martin Kroeker
1abafcd9b2 handle corner cases involving NAN and/or INF 2024-06-06 23:59:43 +02:00
Martin Kroeker
442dec28df Merge pull request #4738 from martin-frbg/issue4737
Disable GEMM3M for generic targets (not implemented)
2024-06-06 17:22:38 +02:00
Martin Kroeker
2787c9f8e4 Disable GEMM3M for generic targets (not implemented) 2024-06-06 14:39:50 +02:00
gxw
af73ae6208 LoongArch: Fixed issue 4728 2024-06-06 16:43:09 +08:00
gxw
8ab2e9ec65 LoongArch: DGEMM small matrix opt 2024-06-04 16:52:45 +08:00
Martin Kroeker
83bc8d5dd8 Merge pull request #4712 from RajalakshmiSR/zscalp10
POWER: Fix issues in zscal to address lapack failures
2024-06-01 11:22:08 +02:00
Martin Kroeker
020b3e1682 fix handling of INF arguments 2024-06-01 00:51:18 +02:00
Martin Kroeker
8c05765a5a fix other corner cases where x=INF 2024-05-31 18:06:36 +02:00
Martin Kroeker
516743f7dc fix other instances of mishandling INF 2024-05-31 16:02:12 +02:00
Martin Kroeker
9ff4e9714e additional fixes for handling INF arguments 2024-05-31 15:44:07 +02:00
Martin Kroeker
ce130f11d2 Update zscal.c 2024-05-31 15:09:03 +02:00
Martin Kroeker
ab13cfef93 more fixes for infinite x 2024-05-31 14:34:49 +02:00
Martin Kroeker
ad2b5c67c8 fix another corner case involving infinity 2024-05-31 01:06:58 +02:00
Bart Oldeman
62f7b244ff Replace use of FLT_MAX in x86_64 zscal.c by isinf()
Commit def4996 fixed issues with inf and nan values in zscal,
but used FLT_MAX, where DBL_MAX or isinf() is more appropriate,
as FLT_MAX is for single precision only.
Using FLT_MAX caused test case failures in the LAPACK tests.

isinf() is consistent with the later fix 969601a1
2024-05-24 17:20:27 +00:00