Commit Graph

2283 Commits

Author SHA1 Message Date
Martin Kroeker dd6c33d34d
make NAN handling depend on dummy2 parameter 2024-07-19 16:14:55 +02:00
Martin Kroeker 2020569705
fix NAN handling and make it depend on dummy2 parameter 2024-07-17 23:55:54 +02:00
Martin Kroeker 3870995f01
make NAN handling depend on dummy2 parameter 2024-07-17 23:54:24 +02:00
Martin Kroeker 7284c533b5
make NAN handling depend on dummy2 parameter 2024-07-17 23:50:40 +02:00
Martin Kroeker 73751218a4
make NAN handling depend on dummy2 parameter 2024-07-17 23:41:26 +02:00
Martin Kroeker b9bfc8ce09
make NAN handling depend on dummy2 parameter 2024-07-17 23:29:50 +02:00
Martin Kroeker eb4879e04c
make NAN handling depend on the dummy2 parameter 2024-07-17 23:24:19 +02:00
Martin Kroeker ee87cb90d0
Merge pull request #4803 from iha-taisei/SVESupportSDGEMV
A64FX: Add support for SVE to SGEMV/DGEMV kernels.
2024-07-17 23:14:21 +02:00
iha fujitsu 0985fdc82b A64FX: Add support for SVE to SGEMV/DGEMV kernels. 2024-07-16 17:31:33 +09:00
Mark Ryan 67bf4b6998 Fix axpby_rvv kernels for cases where inc_y = 0
The following openblas_utest tests fail when the RISCV64_ZVL128B is
enabled.

TEST 89/103 axpby:zaxpby_inc_0 [FAIL]
TEST 92/103 axpby:caxpby_inc_0 [FAIL]
TEST 95/103 axpby:daxpby_inc_0 [FAIL]
TEST 98/103 axpby:saxpby_inc_0 [FAIL]

The issue is that the vectorized kernels do not work when inc_y == 0.
This patch updates the kernels to fall back to the scalar algorithms
when inc_y == 0, fixing the failing tests.

Signed-off-by: Mark Ryan <markdryan@rivosinc.com>
2024-07-15 14:24:47 +00:00
Martin Kroeker 5d08ec7ff3
Merge pull request #4782 from martin-frbg/azurewincl
Fix NAN handling in ARM/generic SCAL; have AzureCI Windows show errors on failure
2024-07-11 23:55:15 +02:00
Chip Kerchner cb154832f8 Vectorize SBGEMM incopy - 4x faster. 2024-07-09 13:10:03 -05:00
Martin Kroeker a5c04e326a
Update scal.c 2024-07-04 22:28:01 +02:00
Martin Kroeker 536200bc9e
fix handling of INF or NAN 2024-07-04 17:47:19 +02:00
Martin Kroeker 3677b3886c
Merge pull request #4702 from bashimao/detect-nv-grace
Correctly detect ARM Neoverse V2 CPUs.
2024-06-30 22:48:48 +02:00
Martin Kroeker f3c364c2cc
temporarily(?) disable the alpha=0 branch as it fails to handle INF,NAN 2024-06-27 22:18:27 +02:00
Martin Kroeker 2a5fe97e3b
temporarily(?) disable the alpha=0 branch as it does not handle INF,NAN 2024-06-27 16:21:57 +02:00
Martin Kroeker c1019d5832
Handle INF and NAN in inputs 2024-06-27 10:58:59 +02:00
Martin Kroeker 9e24121e7e
temporarily(?) disable da=0 shortcut to handle x=Inf or NAN 2024-06-23 17:48:18 +02:00
Martin Kroeker a11f086c17
Update sscal_msa.c 2024-06-23 12:55:19 +02:00
Martin Kroeker 541e1b6959
disable the fast path for inc=1, alpha=0 as it does not handle x=NaN or Inf 2024-06-23 10:37:55 +02:00
Martin Kroeker c08113c279
fix special cases of x= NAN or INF 2024-06-23 01:12:33 +02:00
Martin Kroeker bd47630bcf
exclude the alpha=0 branch as it does not handle NaN or Inf in x 2024-06-23 00:54:39 +02:00
Martin Kroeker 68f2501958
temporarily(?) disable the alpha=0 branch to handle Inf/NaN in x 2024-06-22 21:08:57 +02:00
Martin Kroeker 0a744a939a
temporarily(?) disable the alpha=0 branch to handle NaN/Inf in x 2024-06-22 21:07:43 +02:00
Martin Kroeker 7f8f037a36
handle INF and NAN in input 2024-06-22 16:03:30 +02:00
Martin Kroeker f1248b849d
handle INF and NAN in input 2024-06-22 15:55:29 +02:00
Martin Kroeker a2ee4b1966
Merge branch 'OpenMathLib:develop' into issue4728 2024-06-21 09:35:56 +02:00
Martin Kroeker 3ec59922b6
Add a clobber list to fix utest errors seen with gcc13 on Apple M 2024-06-20 16:19:32 +02:00
Martin Kroeker 3d8054fb16
add clobber list 2024-06-14 22:07:44 +02:00
Martin Kroeker dd7efcf9ef
Avoid exceeding the configured thread count in x86_64 TOBF16 (#4748)
* avoid setting nthreads higher than available
2024-06-14 14:21:13 +02:00
Martin Kroeker 6ffaf99817
disable da=0 shortcut to handle NAN and INF correctly 2024-06-07 14:46:58 +02:00
Martin Kroeker c7cacd9b38
disable the shortcut for da=0 to ensure proper handling of INF and NAN 2024-06-07 13:48:56 +02:00
Martin Kroeker 5ed4f24d6e
Handle corner cases with INF and NAN arguments 2024-06-07 09:39:08 +02:00
Martin Kroeker 2bd43ad0eb
Merge branch 'OpenMathLib:develop' into issue4728 2024-06-07 00:37:25 +02:00
Martin Kroeker 1abafcd9b2
handle corner cases involving NAN and/or INF 2024-06-06 23:59:43 +02:00
Martin Kroeker 442dec28df
Merge pull request #4738 from martin-frbg/issue4737
Disable GEMM3M for generic targets (not implemented)
2024-06-06 17:22:38 +02:00
Martin Kroeker 2787c9f8e4
Disable GEMM3M for generic targets (not implemented) 2024-06-06 14:39:50 +02:00
gxw af73ae6208 LoongArch: Fixed issue 4728 2024-06-06 16:43:09 +08:00
gxw 8ab2e9ec65 LoongArch: DGEMM small matrix opt 2024-06-04 16:52:45 +08:00
Martin Kroeker 83bc8d5dd8
Merge pull request #4712 from RajalakshmiSR/zscalp10
POWER: Fix issues in zscal to address lapack failures
2024-06-01 11:22:08 +02:00
Martin Kroeker 020b3e1682
fix handling of INF arguments 2024-06-01 00:51:18 +02:00
Martin Kroeker 8c05765a5a
fix other corner cases where x=INF 2024-05-31 18:06:36 +02:00
Martin Kroeker 516743f7dc
fix other instances of mishandling INF 2024-05-31 16:02:12 +02:00
Martin Kroeker 9ff4e9714e
additional fixes for handling INF arguments 2024-05-31 15:44:07 +02:00
Martin Kroeker ce130f11d2
Update zscal.c 2024-05-31 15:09:03 +02:00
Martin Kroeker ab13cfef93
more fixes for infinite x 2024-05-31 14:34:49 +02:00
Martin Kroeker ad2b5c67c8
fix another corner case involving infinity 2024-05-31 01:06:58 +02:00
Bart Oldeman 62f7b244ff Replace use of FLT_MAX in x86_64 zscal.c by isinf()
Commit def4996 fixed issues with inf and nan values in zscal,
but used FLT_MAX, where DBL_MAX or isinf() is more appropriate,
as FLT_MAX is for single precision only.
Using FLT_MAX caused test case failures in the LAPACK tests.

isinf() is consistent with the later fix 969601a1
2024-05-24 17:20:27 +00:00
Rajalakshmi Srinivasaraghavan e112191b54 POWER: Fix issues in zscal to address lapack failures
This patch fixes following lapack failures with clang compiler on POWER.
zed.out: ZVX:   18 out of  5190 tests failed to pass the threshold
zgd.out: ZGV drivers:     25 out of   1092 tests failed to pass the threshold
zgd.out: ZGV drivers:      6 out of   1092 tests failed to pass the threshold
2024-05-22 08:00:06 -05:00