Commit Graph

7630 Commits

Author SHA1 Message Date
yancheng
ff2ecc6cda loongarch64: Add optimization for amin. 2023-12-07 14:36:07 +08:00
yancheng
265b5f2e80 loongarch64: Add optimizations for amax. 2023-12-07 14:36:07 +08:00
yancheng
993ede7c70 loongarch64: Add optimizations for scal. 2023-12-07 14:36:07 +08:00
Martin Kroeker
39bf8ece20 Merge pull request #4340 from yinshiyou/la-dev
Add some refines and optimizations for LoongArch.
2023-11-29 08:22:25 +01:00
Martin Kroeker
42b5e081d8 Merge pull request #4348 from catap/macos-undefinded-dynamic-lookup
Allow weak linking on old macOS
2023-11-28 22:14:53 +01:00
Kirill A. Korinsky
a1562e4bae Allow weak linking on old macOS 2023-11-28 14:04:01 +00:00
Martin Kroeker
c4a622db9e Merge pull request #4346 from martin-frbg/issue4343
Fix CMAKE installation location of lapacke_mangling header
2023-11-28 14:01:14 +01:00
Shiyou Yin
9fe07d82fd loongarch: Add LSX optimization for dot. 2023-11-28 20:24:18 +08:00
Shiyou Yin
13b8c44b44 loongarch: Add optimization for dsdot kernel. 2023-11-28 20:24:16 +08:00
Shiyou Yin
3def6a8143 loongarch: Add LASX optimization for dot. 2023-11-28 20:24:14 +08:00
Shiyou Yin
1310a0931b loongarch: Refine build control for loongarch64.
1. Use getauxval instead of cpucfg to test hardware capability.
2. Remove unnecessary code and option for compiler check in c_check.
2023-11-28 20:23:55 +08:00
Martin Kroeker
ff92e6e707 Fix installation location of lapacke_mangling header 2023-11-28 12:53:35 +01:00
Martin Kroeker
b7a28f5e42 Merge pull request #4344 from catap/macos-always-use-ar
Enable overstep of too long args without DYNAMIC_ARCH
2023-11-28 12:39:45 +01:00
Kirill A. Korinsky
9beee55167 Enable overstep of too long args without DYNAMIC_ARCH 2023-11-27 23:41:56 +00:00
Martin Kroeker
fc66ecd25a Merge pull request #4339 from martin-frbg/lapack-3-12-0
Update version number and documentation of Reference-LAPACK to 3.12.0
2023-11-25 23:54:05 +01:00
Martin Kroeker
08be9004f8 Update version number and copyright date to Reference-LAPACK 3.12.0 2023-11-25 18:57:17 +01:00
Martin Kroeker
578f0f9590 Update version number to 3.12.0 2023-11-25 18:53:16 +01:00
Martin Kroeker
3d9e20f614 Update version to 3.12.0 2023-11-25 18:51:54 +01:00
Martin Kroeker
f7351e493c Update Reference-LAPACK docs to 3.12.0 2023-11-25 18:49:34 +01:00
Martin Kroeker
be8661ba40 Merge pull request #4338 from martin-frbg/lapack941
Docu fix for Truncated QR With Pivoting (Reference-LAPACK PR 941)
2023-11-25 18:41:25 +01:00
Martin Kroeker
ca5a87ff1d Small documentation fix for Truncated QR With Pivoting (Reference-LAPACK PR 941) 2023-11-25 15:31:18 +01:00
Shiyou Yin
f745f02f35 benchmark: Fix missing colons in outputs of ./strsv.goto 2023-11-24 14:55:18 +08:00
Martin Kroeker
97d3c9b827 Merge pull request #4336 from martin-frbg/fix4322
Revert unintentional change to gmake linking rule in LAPACK TESTING/LIN
2023-11-22 22:44:21 +01:00
Martin Kroeker
c883abf838 Revert unintentional change to linking rule from PR 4322 2023-11-22 22:41:53 +01:00
Martin Kroeker
8138999cd0 Merge pull request #4333 from codeworm96/update_dynamic_core_readme
Update the list of default dynamic targets for x86_64 in the README to be consistent with the Makefile
2023-11-21 13:50:37 +01:00
Martin Kroeker
a938e48fa2 Merge pull request #4334 from RajalakshmiSR/Makefile_power
POWER: Fixing Makefile error
2023-11-21 10:24:25 +01:00
Rajalakshmi Srinivasaraghavan
47da601a2d POWER: Fixing Makefile error
Recent commit d99aad8ee3 added
extra `)`. This patch fixes the warning from Makefile.
2023-11-20 17:24:22 -06:00
Yuning Zhang
54be8f4d67 Update the list of default dynamic targets for x86_64 in the README to be consistent with the Makefile
Signed-off-by: Yuning Zhang <codeworm96@outlook.com>
2023-11-20 13:28:25 -08:00
Martin Kroeker
d526c4306f Merge pull request #4329 from isuruf/sbgemm
Fix building test_sbgemm
2023-11-20 15:30:02 +01:00
Martin Kroeker
2ea65bacd0 Merge pull request #4330 from bartoldeman/asum-init-mask
Use _mm_set1_epi{32,64x} to init mask in x86-64 [cz]asum
2023-11-20 05:38:39 +01:00
Bart Oldeman
c34e2cf380 Use _mm_set1_epi{32,64x} to init mask in x86-64 [cz]asum
for skylake kernels. This is the same method as used in [sd]asum.
_mm_set1_epi64x was commented out for zasum, but has the advantage
of avoiding possible undefined behaviour (using an uninitialized
variable), optimized out by NVHPC and icx. The new code works
fine with those compilers.

For GCC 12.3 the generated code is identical; no matter what method
you use, the compiler optimizes the code into a compile-time
constant, there is no performance benefit using mm_cmpeq_epi8
since the corresponding instruction (VPCMPEQB) isn't actually
generated!
2023-11-19 21:28:35 +00:00
Martin Kroeker
864c65b526 Merge pull request #4328 from martin-frbg/4239-3
Copy XCode15-specific workaround for Apple M to Fortran flags to fix build of tests
2023-11-19 22:06:17 +01:00
Isuru Fernando
6b2651ece3 Fix building test_sbgemm 2023-11-19 02:57:13 -06:00
Martin Kroeker
22aa401656 Temporarily disable the AVX512 CASUM/ZASUM microkernels for any version of NVIDIA HPC (#4327)
* Temporarily disable the C/ZASUM microkernels for any version of NVHPC
2023-11-19 00:04:31 +01:00
Martin Kroeker
47b03fd4b4 Copy XCode15-specific workaround to Fortran flags to fix build of tests 2023-11-18 23:45:02 +01:00
Martin Kroeker
df4cd7e82c Merge pull request #4326 from bartoldeman/fix-casum-backup-kernel
Fix casum fallback kernel for x86_64
2023-11-18 19:06:06 +01:00
Bart Oldeman
f8ad5344c2 Fix casum fallback kernel.
This kernel is only used on Skylake+ if the kernel with AVX512
intrinsics can't be used, but used the variable x1 incorrectly
in the tail end of the loop, as it is still at the initial
value instead of where x points to.

This caused 55 "other error"s in the LAPACK tests
(https://github.com/OpenMathLib/OpenBLAS/issues/4282)

This change makes casum.c as similar as possible as zasum.c,
because zasum.c does this correctly.
2023-11-17 23:53:56 +00:00
Martin Kroeker
cb2950709f Merge pull request #4322 from martin-frbg/lapack891
Add truncated QR with pivoting (Reference-LAPACK PR 891)
2023-11-16 08:40:17 +01:00
Martin Kroeker
cc622f2406 restore OpenBLAS-specific target_link_libraries 2023-11-15 22:51:09 +01:00
Martin Kroeker
ee47e4e494 run m1/llvm/cmake buid on all 4 cores 2023-11-15 15:21:32 +01:00
Martin Kroeker
8b2a956890 Implement truncated QR with pivot (Reference-LAPACK PR 891) 2023-11-15 14:20:12 +01:00
Martin Kroeker
20a2a83f49 Implement truncated QR with pivoting (Reference-LAPACK PR 891) 2023-11-15 12:18:15 +01:00
Martin Kroeker
f437339130 Implement truncated QR with pivoting (Reference-LAPACK PR 891) 2023-11-15 12:12:26 +01:00
Martin Kroeker
5bf87c86f5 Implement truncated QR with pivoting (Reference-LAPACK PR 891) 2023-11-15 12:10:20 +01:00
Martin Kroeker
0eb8a87977 Implement truncated QR with pivoting (Reference-LAPACK PR 891) 2023-11-15 09:56:37 +01:00
Martin Kroeker
387830b9d5 Implement truncated QR with pivoting (Reference-LAPACK PR 891) 2023-11-15 09:53:06 +01:00
Martin Kroeker
40109c0392 Implement truncated QR with pivoting (Reference-LAPACK PR 891) 2023-11-15 09:50:30 +01:00
Martin Kroeker
23cda457fb Implement truncated QR with pivoting (Reference-LAPACK PR 891) 2023-11-15 09:48:23 +01:00
Martin Kroeker
d36b86a794 Merge pull request #4320 from ChipKerchner/fixOldGCCPower
Fix older versions of gcc - missing __has_builtin, cpuid and no support of P10.
2023-11-15 08:48:17 +01:00
Chip-Kerchner
d99aad8ee3 Fix older version of gcc - missing __has_builtin, cpuid and no support of P10. 2023-11-14 11:07:08 -06:00