Commit Graph

8393 Commits

Author SHA1 Message Date
Evgeni Burovski 9f28161837 BENCH: add benchmarks using codspeed.io 2024-05-18 15:25:16 +03:00
Martin Kroeker 5015548d18
Merge pull request #4700 from martin-frbg/fix4698
Remove spurious brace in cmake/system.cmake
2024-05-16 15:38:01 +02:00
Martin Kroeker ce96e0e50f
Merge pull request #4699 from ChipKerchner/fixSwapVectorOrder
POWER: Fixing endianness issue in cswap/zswap kernel for AIX
2024-05-16 09:28:20 +02:00
Martin Kroeker a3f6b13bc9
remove spurious brace 2024-05-16 09:25:53 +02:00
Chip Kerchner 3a1417671a POWER: Fixing endianness issue in cswap/zswap kernel for AIX 2024-05-15 19:36:46 -05:00
Martin Kroeker 668f48f4fc
Use CMAKE_C_COMPILER_VERSION instead of dumpversion calls (#4698)
* Use CMAKE_C_COMPILER_VERSION throughout
2024-05-15 23:58:14 +02:00
Martin Kroeker 39c96063fb
Merge pull request #4694 from martin-frbg/issue3660
Add a minimum problem size for multithreading in GBMV
2024-05-15 22:14:41 +02:00
Martin Kroeker f5c080f083
Fix CMAKE syntax in kernel file parsing of IFNEQ conditionals (#4695)
* Fix syntax in parsing of IFNEQ
2024-05-15 20:58:31 +02:00
Martin Kroeker 9a2a6a2e52
Merge pull request #4696 from frjohnst/restore_second
Revert PRs 4515 and 4520 (restore second, dsecnd)
2024-05-15 18:35:20 +02:00
frjohnst 87026ac1b1 Revert "fix conlict between PR 4515 and AIX shared obj support"
This reverts commit bdaa6705ca.

It turns out that PRs 4515 and 4520 break the tests under
lapack-netlib/TESTING which require SECOND and DSECND. IBM
has decided this is a bigger biger problem than the conflict
between lapack second_ and the xlf run time.
2024-05-15 09:45:17 -04:00
frjohnst 56d3d1039c Revert "resolve second_ conflict which breaks xlf timef"
This reverts commit 9b24b31419.

It turns out that PRs 4515 and 4520 break the tests under
lapack-netlib/TESTING which require SECOND and DSECND. IBM
has decided this is a bigger biger problem than the conflict
between lapack second_ and the xlf run time.
2024-05-15 09:44:29 -04:00
Martin Kroeker 2957281275
Introduce a lower limit for multithreading 2024-05-14 18:59:21 +02:00
Martin Kroeker 5fd871d7ea
Introduce a lower limit for multithreading 2024-05-14 18:48:03 +02:00
Martin Kroeker 6ca9ffa7f5
Merge pull request #4655 from yamazakimitsufumi/update_2d_thread_distribution
Expanding the scope of 2D thread distribution to improve multi-threaded DGEMM performance
2024-05-14 18:12:43 +02:00
Deeksha Goplani 0dc80a5c8d locks improvement 2024-05-13 22:17:23 +05:30
Martin Kroeker b45a78c6e9
fix zdotu argument passing in utest_ext on windows (#4691)
* fix passing of results on windows
2024-05-13 14:50:50 +02:00
Martin Kroeker 1ab9f50561
Merge pull request #4690 from mattip/blasint
use blasint instead of int to quiet warnings
2024-05-13 11:00:33 +02:00
Amrita H S 87b3d9054f Fix regression SAXPY when compiler with OpenXL compiler.
SAXPY built with OpenXL regresses when compared to SAXPY
built with gcc. OpenXL compiler doesn't know that the
SAXPY inner kernel assembly is a 64 element loop and
to it the remainder loop is the main loop. It vectorizes
and interleaves the remainder to be a 48 elements per
iteration loop. With a max of 63 iterations, a 48 element
loop is mostly not going to get executed, so the 1 element
scalar loop that is the remainder after that is probably
mostly what gets executed.

This can be fixed by adding a pragma, loop interleave_count(2)
which will result in 8 element loop.

Signed-off-by: Amrita H S <amritahs@linux.vnet.ibm.com>
2024-05-12 23:27:55 -05:00
Matti Picus 243640c354 use blasint instead of int to quiet warnings 2024-05-12 10:24:16 +03:00
Martin Kroeker f0560f906f
Merge pull request #4689 from martin-frbg/issue4684
Fix compilation of the BLAS extension utests for NO_CBLAS=1
2024-05-11 14:39:54 +02:00
Martin Kroeker e1e0d9a2ae
Merge pull request #4688 from XiWeiGu/loongarch64_fixed_gcc14_compilation
loongarch64: Fixed GCC14 compilation issue
2024-05-11 13:38:45 +02:00
Martin Kroeker d8baf2f2ea
Support compilation without CBLAS 2024-05-11 13:10:54 +02:00
Martin Kroeker a6c184d150
forward NO_CFLAGS to the CFLAGS, if set 2024-05-11 13:07:30 +02:00
gxw ecf8b588a9 loongarch64: Fixed GCC14 compilation issue 2024-05-11 16:14:18 +08:00
Martin Kroeker 8da6f7e5f2
Merge pull request #4686 from XiWeiGu/loongarch64_dgemm_kernel_16x6
Loongarch64: Improving the Performance and Stability of dgemm
2024-05-10 11:29:12 +02:00
gxw f9a26240a7 loongarch64: Fixed icamax_lsx 2024-05-10 14:16:40 +08:00
gxw cb0f707409 loongarch64: Fixed utest fork:safety 2024-05-10 14:16:36 +08:00
gxw 637c650f4f loongarch64: Add buffer offset for target LOONGSON3R5 2024-05-10 11:42:53 +08:00
Martin Kroeker 5d678f1831
Merge pull request #4685 from martin-frbg/issue4660-2
Fix builds for LOONGARCH64 in LSX mode
2024-05-09 13:17:29 +02:00
Martin Kroeker b45d8e1ab2
remove stray comma 2024-05-09 12:33:19 +02:00
Martin Kroeker 5500b4ab26
Merge pull request #4680 from theAeon/develop
Expose whether locking is enabled in get_config
2024-05-08 19:03:57 +02:00
gxw 6017ad7146 loongarch64: Update dgemm_kernel_16x4 to dgemm_kernel_16x6 2024-05-08 10:10:26 +08:00
Martin Kroeker d66aa63478
Merge pull request #4681 from martin-frbg/fix4662-2
fix HUGETLB allocation for TLS mode as well
2024-05-08 01:44:32 +02:00
Martin Kroeker f0f1ff7820
fix HUGETLB allocation for TLS mode as well 2024-05-08 00:40:36 +02:00
Andrew Robbins edfe1aa471
Expose whether locking is enabled in get_config 2024-05-07 11:12:03 -04:00
Martin Kroeker edeb5259a1
Merge pull request #4679 from martin-frbg/fix4662
Restore Loongson LA64ARCH handling
2024-05-07 15:57:50 +02:00
Martin Kroeker 4376b6f7d2
Restore Loongson LA64ARCH handling 2024-05-07 14:42:01 +02:00
Martin Kroeker 8735b54fa8
Merge pull request #4662 from martin-frbg/hugetlb-doc
Fix and document the two HUGETLB options for buffer allocation in Makefile.rule
2024-05-07 13:32:07 +02:00
Martin Kroeker fc10673fd3
Merge branch 'develop' into hugetlb-doc 2024-05-07 13:31:39 +02:00
Martin Kroeker c20189cc82
Merge pull request #4677 from martin-frbg/issue4676
Add autodetection of Intel Meteor Lake and Emerald Rapids
2024-05-06 17:10:19 +02:00
Martin Kroeker bbd227ce4a
Add Intel Meteor Lake and Emerald Rapids 2024-05-06 00:11:44 +02:00
Martin Kroeker f034745ce6
Merge pull request #4675 from martin-frbg/issue4619
Mention LD_LIBRARY_PATH in user documentation
2024-05-04 15:50:13 +02:00
Martin Kroeker a82ecadc11
mention LD_LIBRARY_PATH 2024-05-04 15:48:48 +02:00
Martin Kroeker b859f6f191
Merge pull request #4617 from cyk2018/patch-1
[Doc]Update user_manual.md for static linker
2024-05-04 15:20:52 +02:00
Martin Kroeker dc99b61380
sort unwanted interdependencies of alloc_shm and alloc_hugetlb 2024-05-04 14:49:00 +02:00
Martin Kroeker 9c4e10fbd1
sort hugetlb and shm alloc options 2024-05-04 14:48:02 +02:00
Martin Kroeker a63d71129c
Merge pull request #4671 from martin-frbg/issue4668
Silence a GCC14 warning/error in the f2c-converted LAPACK
2024-04-30 20:06:42 +02:00
Martin Kroeker 3d26837a35
Suppress GCC14 error exit in the f2c-converted LAPACK 2024-04-30 19:05:18 +02:00
Martin Kroeker 7c915e64ca
Silence a GCC14 warning/error in the f2c-converted LAPACK 2024-04-30 17:48:14 +02:00
Martin Kroeker edacf9b397
Work around spurious BLAS3 test errors on LOONGSON3R3/4 (#4667)
Force compilation with gfortran to use O0 on older Loongson hardware to avoid spurious test failures
2024-04-30 08:50:47 +02:00