Commit Graph

8324 Commits

Author SHA1 Message Date
Amrita H S
87b3d9054f Fix regression SAXPY when compiler with OpenXL compiler.
SAXPY built with OpenXL regresses when compared to SAXPY
built with gcc. OpenXL compiler doesn't know that the
SAXPY inner kernel assembly is a 64 element loop and
to it the remainder loop is the main loop. It vectorizes
and interleaves the remainder to be a 48 elements per
iteration loop. With a max of 63 iterations, a 48 element
loop is mostly not going to get executed, so the 1 element
scalar loop that is the remainder after that is probably
mostly what gets executed.

This can be fixed by adding a pragma, loop interleave_count(2)
which will result in 8 element loop.

Signed-off-by: Amrita H S <amritahs@linux.vnet.ibm.com>
2024-05-12 23:27:55 -05:00
Martin Kroeker
f0560f906f Merge pull request #4689 from martin-frbg/issue4684
Fix compilation of the BLAS extension utests for NO_CBLAS=1
2024-05-11 14:39:54 +02:00
Martin Kroeker
e1e0d9a2ae Merge pull request #4688 from XiWeiGu/loongarch64_fixed_gcc14_compilation
loongarch64: Fixed GCC14 compilation issue
2024-05-11 13:38:45 +02:00
Martin Kroeker
d8baf2f2ea Support compilation without CBLAS 2024-05-11 13:10:54 +02:00
Martin Kroeker
a6c184d150 forward NO_CFLAGS to the CFLAGS, if set 2024-05-11 13:07:30 +02:00
gxw
ecf8b588a9 loongarch64: Fixed GCC14 compilation issue 2024-05-11 16:14:18 +08:00
Martin Kroeker
8da6f7e5f2 Merge pull request #4686 from XiWeiGu/loongarch64_dgemm_kernel_16x6
Loongarch64: Improving the Performance and Stability of dgemm
2024-05-10 11:29:12 +02:00
gxw
f9a26240a7 loongarch64: Fixed icamax_lsx 2024-05-10 14:16:40 +08:00
gxw
cb0f707409 loongarch64: Fixed utest fork:safety 2024-05-10 14:16:36 +08:00
gxw
637c650f4f loongarch64: Add buffer offset for target LOONGSON3R5 2024-05-10 11:42:53 +08:00
Martin Kroeker
5d678f1831 Merge pull request #4685 from martin-frbg/issue4660-2
Fix builds for LOONGARCH64 in LSX mode
2024-05-09 13:17:29 +02:00
Martin Kroeker
b45d8e1ab2 remove stray comma 2024-05-09 12:33:19 +02:00
Martin Kroeker
5500b4ab26 Merge pull request #4680 from theAeon/develop
Expose whether locking is enabled in get_config
2024-05-08 19:03:57 +02:00
gxw
6017ad7146 loongarch64: Update dgemm_kernel_16x4 to dgemm_kernel_16x6 2024-05-08 10:10:26 +08:00
Martin Kroeker
d66aa63478 Merge pull request #4681 from martin-frbg/fix4662-2
fix HUGETLB allocation for TLS mode as well
2024-05-08 01:44:32 +02:00
Martin Kroeker
f0f1ff7820 fix HUGETLB allocation for TLS mode as well 2024-05-08 00:40:36 +02:00
Andrew Robbins
edfe1aa471 Expose whether locking is enabled in get_config 2024-05-07 11:12:03 -04:00
Martin Kroeker
edeb5259a1 Merge pull request #4679 from martin-frbg/fix4662
Restore Loongson LA64ARCH handling
2024-05-07 15:57:50 +02:00
Martin Kroeker
4376b6f7d2 Restore Loongson LA64ARCH handling 2024-05-07 14:42:01 +02:00
Martin Kroeker
8735b54fa8 Merge pull request #4662 from martin-frbg/hugetlb-doc
Fix and document the two HUGETLB options for buffer allocation in Makefile.rule
2024-05-07 13:32:07 +02:00
Martin Kroeker
fc10673fd3 Merge branch 'develop' into hugetlb-doc 2024-05-07 13:31:39 +02:00
Martin Kroeker
c20189cc82 Merge pull request #4677 from martin-frbg/issue4676
Add autodetection of Intel Meteor Lake and Emerald Rapids
2024-05-06 17:10:19 +02:00
Martin Kroeker
bbd227ce4a Add Intel Meteor Lake and Emerald Rapids 2024-05-06 00:11:44 +02:00
Martin Kroeker
f034745ce6 Merge pull request #4675 from martin-frbg/issue4619
Mention LD_LIBRARY_PATH in user documentation
2024-05-04 15:50:13 +02:00
Martin Kroeker
a82ecadc11 mention LD_LIBRARY_PATH 2024-05-04 15:48:48 +02:00
Martin Kroeker
b859f6f191 Merge pull request #4617 from cyk2018/patch-1
[Doc]Update user_manual.md for static linker
2024-05-04 15:20:52 +02:00
Martin Kroeker
dc99b61380 sort unwanted interdependencies of alloc_shm and alloc_hugetlb 2024-05-04 14:49:00 +02:00
Martin Kroeker
9c4e10fbd1 sort hugetlb and shm alloc options 2024-05-04 14:48:02 +02:00
Martin Kroeker
a63d71129c Merge pull request #4671 from martin-frbg/issue4668
Silence a GCC14 warning/error in the f2c-converted LAPACK
2024-04-30 20:06:42 +02:00
Martin Kroeker
3d26837a35 Suppress GCC14 error exit in the f2c-converted LAPACK 2024-04-30 19:05:18 +02:00
Martin Kroeker
7c915e64ca Silence a GCC14 warning/error in the f2c-converted LAPACK 2024-04-30 17:48:14 +02:00
Martin Kroeker
edacf9b397 Work around spurious BLAS3 test errors on LOONGSON3R3/4 (#4667)
Force compilation with gfortran to use O0 on older Loongson hardware to avoid spurious test failures
2024-04-30 08:50:47 +02:00
Martin Kroeker
89e3fd0821 Merge pull request #4666 from martin-frbg/issue4633
Fix spurious errors in the extended utest for INTERFACE64=1 on big-endian systems
2024-04-29 17:23:20 +02:00
Martin Kroeker
b1d722fc0c Fix cast to work with INTERFACE64 (especially on big-endian) 2024-04-29 15:37:26 +02:00
Martin Kroeker
1031d161f6 Merge pull request #4663 from ayappanec/develop
Fix openblas_utest_ext build in AIX
2024-04-25 18:05:33 +02:00
Ayappan P
f4ee0a423b Fix openblas_utest_ext build in AIX 2024-04-25 07:32:21 -04:00
Martin Kroeker
faf7b3d1bb Document the two HUGETLB options for buffer allocation 2024-04-24 17:49:40 +02:00
Martin Kroeker
ab5882ebf0 Merge pull request #4661 from martin-frbg/issue4660
Fix CMAKE builds for Loongarch64
2024-04-24 09:01:22 +02:00
Martin Kroeker
69aa93e34f Fix Loongson compiler flag check 2024-04-23 21:57:42 +02:00
Martin Kroeker
015042f7b5 Fix Loongson compiler flag test 2024-04-23 21:55:57 +02:00
Martin Kroeker
992b71fea2 remove stray comma 2024-04-23 21:52:26 +02:00
Martin Kroeker
d421dec278 Merge pull request #4656 from zboszor/fix-x86-64-build-v2
Add forgotten conditional uses of PREFETCH
2024-04-23 21:05:08 +02:00
Martin Kroeker
ae695d4ca0 Merge pull request #4642 from XiWeiGu/loongarch64_clang
CI: Add clang test for loongarch64
2024-04-23 18:25:49 +02:00
gxw
1cdad09760 CI: Add clang test for loongarch64 2024-04-23 19:30:24 +08:00
gxw
7cd438a5ac loongarch64: Fixed clang compilation issues 2024-04-23 19:19:11 +08:00
Martin Kroeker
35d84ad012 Merge pull request #4658 from mattip/remove-extra-suffix
do not add LIBNAMESUFFIX to dylib
2024-04-23 11:03:33 +02:00
Martin Kroeker
f6eadf0971 Merge pull request #4577 from shivammonaka/Threading_Callback
Introduced callback to Pthread, Win32 and OpenMP backend
2024-04-22 19:19:46 +02:00
Martin Kroeker
61214fcef7 Fix utest_ext build on AIX (#4657)
* Add all-in-one version of utest_ext for AIX
2024-04-22 14:24:33 +02:00
Martin Kroeker
ddcd7d6fa8 Merge branch 'develop' into Threading_Callback 2024-04-21 22:27:11 +02:00
Matti Picus
94feadf242 do not add LIBNAMESUFFIX to dylib 2024-04-21 13:16:40 +10:00