Martin Kroeker
89c7bbcba6
add cblas_?gemm_batch
2024-05-29 15:47:02 +02:00
Martin Kroeker
103637887e
add cblas_?gemm_batch
2024-05-29 15:46:10 +02:00
Martin Kroeker
0073affe63
Merge pull request #4693 from goplanid/locks-improvement
...
Lock Management Improvements for Memory Allocation Efficiency
2024-05-26 23:14:52 +02:00
Martin Kroeker
834e633d79
Merge pull request #4718 from martin-frbg/issue4713
...
Override Intel icx's default fp-model to ensure correct handling on NaNs
2024-05-26 16:38:18 +02:00
Martin Kroeker
3833190454
Merge pull request #4716 from martin-frbg/lapack1018
...
Fix a potential bounds error in ?UNHR_COL/?ORHR_COL (Reference-LAPACK PR 1018)
2024-05-26 14:01:31 +02:00
Martin Kroeker
cf7e668fe8
Merge pull request #4709 from martin-frbg/docsbuildbranch
...
Don't try to deploy docs when PR-building in a fork
2024-05-26 14:01:05 +02:00
Martin Kroeker
8b4996a2d5
Override icx's default fast math mode to ensure correct NaN handling
2024-05-26 13:16:03 +02:00
Martin Kroeker
616cc28d82
Override icx's default fast math mode to ensure correct NaN handling
2024-05-26 12:59:11 +02:00
Martin Kroeker
772116879d
Merge pull request #4717 from bartoldeman/zscal-float-inf-fix
...
Replace use of FLT_MAX in x86_64 zscal.c by isinf()
2024-05-26 12:26:41 +02:00
Bart Oldeman
62f7b244ff
Replace use of FLT_MAX in x86_64 zscal.c by isinf()
...
Commit def4996
fixed issues with inf and nan values in zscal,
but used FLT_MAX, where DBL_MAX or isinf() is more appropriate,
as FLT_MAX is for single precision only.
Using FLT_MAX caused test case failures in the LAPACK tests.
isinf() is consistent with the later fix 969601a1
2024-05-24 17:20:27 +00:00
Martin Kroeker
7ebbe3cc72
Fix potential bounds error (Reference-LAPACK PR 1018)
2024-05-23 23:12:19 +02:00
Martin Kroeker
791e015024
Fix potential bounds error (Reference-LAPACK PR 1018)
2024-05-23 23:11:14 +02:00
Martin Kroeker
4dd715d220
Fix potential bounds error (Reference-LAPACK PR 1018)
2024-05-23 23:09:55 +02:00
Martin Kroeker
e2c1a1e269
Fix potential bounds error (Reference-LAPACK PR 1018)
2024-05-23 23:08:27 +02:00
Rajalakshmi Srinivasaraghavan
e112191b54
POWER: Fix issues in zscal to address lapack failures
...
This patch fixes following lapack failures with clang compiler on POWER.
zed.out: ZVX: 18 out of 5190 tests failed to pass the threshold
zgd.out: ZGV drivers: 25 out of 1092 tests failed to pass the threshold
zgd.out: ZGV drivers: 6 out of 1092 tests failed to pass the threshold
2024-05-22 08:00:06 -05:00
Martin Kroeker
172d91846f
Don't try to deploy docs in a fork
2024-05-20 22:53:43 +02:00
Martin Kroeker
700ea74a37
Merge pull request #4705 from martin-frbg/issue4703
...
Fix INTERFACE64 builds on Loongarch64
2024-05-18 21:38:22 +02:00
Martin Kroeker
aa259b141d
Merge pull request #4704 from amritahs-ibm/saxpy_perf_fix
...
Fix regression SAXPY when compiler with OpenXL compiler.
2024-05-18 19:11:25 +02:00
Martin Kroeker
25b34e67f9
Merge pull request #4678 from ev-br/codspeed
...
WIP: add codspeed benchmarks [skip cirrus]
2024-05-18 16:51:02 +02:00
Martin Kroeker
6494f432df
Fix INTERFACE64 builds on Loongarch64
2024-05-18 16:49:03 +02:00
Evgeni Burovski
81cf0db047
DOC: add a readme for benchmarks/pybench
2024-05-18 15:30:00 +03:00
Evgeni Burovski
9f28161837
BENCH: add benchmarks using codspeed.io
2024-05-18 15:25:16 +03:00
Martin Kroeker
5015548d18
Merge pull request #4700 from martin-frbg/fix4698
...
Remove spurious brace in cmake/system.cmake
2024-05-16 15:38:01 +02:00
Matthias Langer
0050a9660b
Correctly detect ARM Neoverse V2 CPUs.
2024-05-16 09:59:52 +00:00
Martin Kroeker
ce96e0e50f
Merge pull request #4699 from ChipKerchner/fixSwapVectorOrder
...
POWER: Fixing endianness issue in cswap/zswap kernel for AIX
2024-05-16 09:28:20 +02:00
Martin Kroeker
a3f6b13bc9
remove spurious brace
2024-05-16 09:25:53 +02:00
Chip Kerchner
3a1417671a
POWER: Fixing endianness issue in cswap/zswap kernel for AIX
2024-05-15 19:36:46 -05:00
Martin Kroeker
668f48f4fc
Use CMAKE_C_COMPILER_VERSION instead of dumpversion calls ( #4698 )
...
* Use CMAKE_C_COMPILER_VERSION throughout
2024-05-15 23:58:14 +02:00
Martin Kroeker
39c96063fb
Merge pull request #4694 from martin-frbg/issue3660
...
Add a minimum problem size for multithreading in GBMV
2024-05-15 22:14:41 +02:00
Martin Kroeker
f5c080f083
Fix CMAKE syntax in kernel file parsing of IFNEQ conditionals ( #4695 )
...
* Fix syntax in parsing of IFNEQ
2024-05-15 20:58:31 +02:00
Martin Kroeker
9a2a6a2e52
Merge pull request #4696 from frjohnst/restore_second
...
Revert PRs 4515 and 4520 (restore second, dsecnd)
2024-05-15 18:35:20 +02:00
frjohnst
87026ac1b1
Revert "fix conlict between PR 4515 and AIX shared obj support"
...
This reverts commit bdaa6705ca
.
It turns out that PRs 4515 and 4520 break the tests under
lapack-netlib/TESTING which require SECOND and DSECND. IBM
has decided this is a bigger biger problem than the conflict
between lapack second_ and the xlf run time.
2024-05-15 09:45:17 -04:00
frjohnst
56d3d1039c
Revert "resolve second_ conflict which breaks xlf timef"
...
This reverts commit 9b24b31419
.
It turns out that PRs 4515 and 4520 break the tests under
lapack-netlib/TESTING which require SECOND and DSECND. IBM
has decided this is a bigger biger problem than the conflict
between lapack second_ and the xlf run time.
2024-05-15 09:44:29 -04:00
Martin Kroeker
2957281275
Introduce a lower limit for multithreading
2024-05-14 18:59:21 +02:00
Martin Kroeker
5fd871d7ea
Introduce a lower limit for multithreading
2024-05-14 18:48:03 +02:00
Martin Kroeker
6ca9ffa7f5
Merge pull request #4655 from yamazakimitsufumi/update_2d_thread_distribution
...
Expanding the scope of 2D thread distribution to improve multi-threaded DGEMM performance
2024-05-14 18:12:43 +02:00
Deeksha Goplani
0dc80a5c8d
locks improvement
2024-05-13 22:17:23 +05:30
Martin Kroeker
b45a78c6e9
fix zdotu argument passing in utest_ext on windows ( #4691 )
...
* fix passing of results on windows
2024-05-13 14:50:50 +02:00
Martin Kroeker
1ab9f50561
Merge pull request #4690 from mattip/blasint
...
use blasint instead of int to quiet warnings
2024-05-13 11:00:33 +02:00
Amrita H S
87b3d9054f
Fix regression SAXPY when compiler with OpenXL compiler.
...
SAXPY built with OpenXL regresses when compared to SAXPY
built with gcc. OpenXL compiler doesn't know that the
SAXPY inner kernel assembly is a 64 element loop and
to it the remainder loop is the main loop. It vectorizes
and interleaves the remainder to be a 48 elements per
iteration loop. With a max of 63 iterations, a 48 element
loop is mostly not going to get executed, so the 1 element
scalar loop that is the remainder after that is probably
mostly what gets executed.
This can be fixed by adding a pragma, loop interleave_count(2)
which will result in 8 element loop.
Signed-off-by: Amrita H S <amritahs@linux.vnet.ibm.com>
2024-05-12 23:27:55 -05:00
Matti Picus
243640c354
use blasint instead of int to quiet warnings
2024-05-12 10:24:16 +03:00
Martin Kroeker
f0560f906f
Merge pull request #4689 from martin-frbg/issue4684
...
Fix compilation of the BLAS extension utests for NO_CBLAS=1
2024-05-11 14:39:54 +02:00
Martin Kroeker
e1e0d9a2ae
Merge pull request #4688 from XiWeiGu/loongarch64_fixed_gcc14_compilation
...
loongarch64: Fixed GCC14 compilation issue
2024-05-11 13:38:45 +02:00
Martin Kroeker
d8baf2f2ea
Support compilation without CBLAS
2024-05-11 13:10:54 +02:00
Martin Kroeker
a6c184d150
forward NO_CFLAGS to the CFLAGS, if set
2024-05-11 13:07:30 +02:00
gxw
ecf8b588a9
loongarch64: Fixed GCC14 compilation issue
2024-05-11 16:14:18 +08:00
Martin Kroeker
8da6f7e5f2
Merge pull request #4686 from XiWeiGu/loongarch64_dgemm_kernel_16x6
...
Loongarch64: Improving the Performance and Stability of dgemm
2024-05-10 11:29:12 +02:00
gxw
f9a26240a7
loongarch64: Fixed icamax_lsx
2024-05-10 14:16:40 +08:00
gxw
cb0f707409
loongarch64: Fixed utest fork:safety
2024-05-10 14:16:36 +08:00
gxw
637c650f4f
loongarch64: Add buffer offset for target LOONGSON3R5
2024-05-10 11:42:53 +08:00