Commit Graph

2270 Commits

Author SHA1 Message Date
Chip Kerchner cb154832f8 Vectorize SBGEMM incopy - 4x faster. 2024-07-09 13:10:03 -05:00
Martin Kroeker 3677b3886c
Merge pull request #4702 from bashimao/detect-nv-grace
Correctly detect ARM Neoverse V2 CPUs.
2024-06-30 22:48:48 +02:00
Martin Kroeker f3c364c2cc
temporarily(?) disable the alpha=0 branch as it fails to handle INF,NAN 2024-06-27 22:18:27 +02:00
Martin Kroeker 2a5fe97e3b
temporarily(?) disable the alpha=0 branch as it does not handle INF,NAN 2024-06-27 16:21:57 +02:00
Martin Kroeker c1019d5832
Handle INF and NAN in inputs 2024-06-27 10:58:59 +02:00
Martin Kroeker 9e24121e7e
temporarily(?) disable da=0 shortcut to handle x=Inf or NAN 2024-06-23 17:48:18 +02:00
Martin Kroeker a11f086c17
Update sscal_msa.c 2024-06-23 12:55:19 +02:00
Martin Kroeker 541e1b6959
disable the fast path for inc=1, alpha=0 as it does not handle x=NaN or Inf 2024-06-23 10:37:55 +02:00
Martin Kroeker c08113c279
fix special cases of x= NAN or INF 2024-06-23 01:12:33 +02:00
Martin Kroeker bd47630bcf
exclude the alpha=0 branch as it does not handle NaN or Inf in x 2024-06-23 00:54:39 +02:00
Martin Kroeker 68f2501958
temporarily(?) disable the alpha=0 branch to handle Inf/NaN in x 2024-06-22 21:08:57 +02:00
Martin Kroeker 0a744a939a
temporarily(?) disable the alpha=0 branch to handle NaN/Inf in x 2024-06-22 21:07:43 +02:00
Martin Kroeker 7f8f037a36
handle INF and NAN in input 2024-06-22 16:03:30 +02:00
Martin Kroeker f1248b849d
handle INF and NAN in input 2024-06-22 15:55:29 +02:00
Martin Kroeker a2ee4b1966
Merge branch 'OpenMathLib:develop' into issue4728 2024-06-21 09:35:56 +02:00
Martin Kroeker 3ec59922b6
Add a clobber list to fix utest errors seen with gcc13 on Apple M 2024-06-20 16:19:32 +02:00
Martin Kroeker 3d8054fb16
add clobber list 2024-06-14 22:07:44 +02:00
Martin Kroeker dd7efcf9ef
Avoid exceeding the configured thread count in x86_64 TOBF16 (#4748)
* avoid setting nthreads higher than available
2024-06-14 14:21:13 +02:00
Martin Kroeker 6ffaf99817
disable da=0 shortcut to handle NAN and INF correctly 2024-06-07 14:46:58 +02:00
Martin Kroeker c7cacd9b38
disable the shortcut for da=0 to ensure proper handling of INF and NAN 2024-06-07 13:48:56 +02:00
Martin Kroeker 5ed4f24d6e
Handle corner cases with INF and NAN arguments 2024-06-07 09:39:08 +02:00
Martin Kroeker 2bd43ad0eb
Merge branch 'OpenMathLib:develop' into issue4728 2024-06-07 00:37:25 +02:00
Martin Kroeker 1abafcd9b2
handle corner cases involving NAN and/or INF 2024-06-06 23:59:43 +02:00
Martin Kroeker 442dec28df
Merge pull request #4738 from martin-frbg/issue4737
Disable GEMM3M for generic targets (not implemented)
2024-06-06 17:22:38 +02:00
Martin Kroeker 2787c9f8e4
Disable GEMM3M for generic targets (not implemented) 2024-06-06 14:39:50 +02:00
gxw af73ae6208 LoongArch: Fixed issue 4728 2024-06-06 16:43:09 +08:00
gxw 8ab2e9ec65 LoongArch: DGEMM small matrix opt 2024-06-04 16:52:45 +08:00
Martin Kroeker 83bc8d5dd8
Merge pull request #4712 from RajalakshmiSR/zscalp10
POWER: Fix issues in zscal to address lapack failures
2024-06-01 11:22:08 +02:00
Martin Kroeker 020b3e1682
fix handling of INF arguments 2024-06-01 00:51:18 +02:00
Martin Kroeker 8c05765a5a
fix other corner cases where x=INF 2024-05-31 18:06:36 +02:00
Martin Kroeker 516743f7dc
fix other instances of mishandling INF 2024-05-31 16:02:12 +02:00
Martin Kroeker 9ff4e9714e
additional fixes for handling INF arguments 2024-05-31 15:44:07 +02:00
Martin Kroeker ce130f11d2
Update zscal.c 2024-05-31 15:09:03 +02:00
Martin Kroeker ab13cfef93
more fixes for infinite x 2024-05-31 14:34:49 +02:00
Martin Kroeker ad2b5c67c8
fix another corner case involving infinity 2024-05-31 01:06:58 +02:00
Bart Oldeman 62f7b244ff Replace use of FLT_MAX in x86_64 zscal.c by isinf()
Commit def4996 fixed issues with inf and nan values in zscal,
but used FLT_MAX, where DBL_MAX or isinf() is more appropriate,
as FLT_MAX is for single precision only.
Using FLT_MAX caused test case failures in the LAPACK tests.

isinf() is consistent with the later fix 969601a1
2024-05-24 17:20:27 +00:00
Rajalakshmi Srinivasaraghavan e112191b54 POWER: Fix issues in zscal to address lapack failures
This patch fixes following lapack failures with clang compiler on POWER.
zed.out: ZVX:   18 out of  5190 tests failed to pass the threshold
zgd.out: ZGV drivers:     25 out of   1092 tests failed to pass the threshold
zgd.out: ZGV drivers:      6 out of   1092 tests failed to pass the threshold
2024-05-22 08:00:06 -05:00
Martin Kroeker aa259b141d
Merge pull request #4704 from amritahs-ibm/saxpy_perf_fix
Fix regression SAXPY when compiler with OpenXL compiler.
2024-05-18 19:11:25 +02:00
Matthias Langer 0050a9660b Correctly detect ARM Neoverse V2 CPUs. 2024-05-16 09:59:52 +00:00
Chip Kerchner 3a1417671a POWER: Fixing endianness issue in cswap/zswap kernel for AIX 2024-05-15 19:36:46 -05:00
Amrita H S 87b3d9054f Fix regression SAXPY when compiler with OpenXL compiler.
SAXPY built with OpenXL regresses when compared to SAXPY
built with gcc. OpenXL compiler doesn't know that the
SAXPY inner kernel assembly is a 64 element loop and
to it the remainder loop is the main loop. It vectorizes
and interleaves the remainder to be a 48 elements per
iteration loop. With a max of 63 iterations, a 48 element
loop is mostly not going to get executed, so the 1 element
scalar loop that is the remainder after that is probably
mostly what gets executed.

This can be fixed by adding a pragma, loop interleave_count(2)
which will result in 8 element loop.

Signed-off-by: Amrita H S <amritahs@linux.vnet.ibm.com>
2024-05-12 23:27:55 -05:00
Martin Kroeker 8da6f7e5f2
Merge pull request #4686 from XiWeiGu/loongarch64_dgemm_kernel_16x6
Loongarch64: Improving the Performance and Stability of dgemm
2024-05-10 11:29:12 +02:00
gxw f9a26240a7 loongarch64: Fixed icamax_lsx 2024-05-10 14:16:40 +08:00
gxw cb0f707409 loongarch64: Fixed utest fork:safety 2024-05-10 14:16:36 +08:00
Martin Kroeker b45d8e1ab2
remove stray comma 2024-05-09 12:33:19 +02:00
gxw 6017ad7146 loongarch64: Update dgemm_kernel_16x4 to dgemm_kernel_16x6 2024-05-08 10:10:26 +08:00
Martin Kroeker 992b71fea2
remove stray comma 2024-04-23 21:52:26 +02:00
Martin Kroeker d421dec278
Merge pull request #4656 from zboszor/fix-x86-64-build-v2
Add forgotten conditional uses of PREFETCH
2024-04-23 21:05:08 +02:00
Martin Kroeker ae695d4ca0
Merge pull request #4642 from XiWeiGu/loongarch64_clang
CI: Add clang test for loongarch64
2024-04-23 18:25:49 +02:00
gxw 7cd438a5ac loongarch64: Fixed clang compilation issues 2024-04-23 19:19:11 +08:00