Commit Graph

7699 Commits

Author SHA1 Message Date
yancheng be83f5e4e0 loongarch64: Add optimizations for iamax. 2023-12-07 14:36:07 +08:00
yancheng e3fb2b5afa loongarch64: Add optimizations for imin. 2023-12-07 14:36:07 +08:00
yancheng e46b48e372 loongarch64: Add optimizations for imax. 2023-12-07 14:36:07 +08:00
yancheng 702fc1d56d loongarch64: Add optimization for min. 2023-12-07 14:36:07 +08:00
yancheng 346b384d1c loongarch64: Add optimization for max. 2023-12-07 14:36:07 +08:00
yancheng ff2ecc6cda loongarch64: Add optimization for amin. 2023-12-07 14:36:07 +08:00
yancheng 265b5f2e80 loongarch64: Add optimizations for amax. 2023-12-07 14:36:07 +08:00
yancheng 993ede7c70 loongarch64: Add optimizations for scal. 2023-12-07 14:36:07 +08:00
Mark Seminatore 4ebf814b42 fix bug failing to mark task as finished. 2023-12-05 23:28:37 -08:00
Mark Seminatore 5f51811728 try at new threading model 2023-12-05 22:43:36 -08:00
Martin Kroeker a8cb611157
Merge pull request #4358 from martin-frbg/lapack954
Fix keyword used to count successful tests (Reference-LAPACK PR 954)
2023-12-05 22:20:15 +01:00
Martin Kroeker 589f2b6466
Fix search phrase used to count successful tests (Reference-LAPACK PR 954) 2023-12-05 20:10:20 +01:00
Martin Kroeker 6aa5f53e26
Merge pull request #4357 from martin-frbg/lapack953
Fix memory leak in LAPACK testing framework (Reference-LAPACK PR 953)
2023-12-05 20:03:21 +01:00
Martin Kroeker effb7af2a2
Fix memory leak (Reference-LAPACK PR 953) 2023-12-05 17:55:38 +01:00
Martin Kroeker 5915a69734
Merge pull request #4356 from martin-frbg/lapack736-2
Add LAPACK tests for the Dynamic Mode Decomposition functions from Reference-LAPACK PR 736
2023-12-05 17:48:42 +01:00
Martin Kroeker 226a14c549
Restore library path adjustments 2023-12-05 15:50:06 +01:00
Martin Kroeker c5fa318add
Add tests for DMD (Reference-LAPACK PR 736) 2023-12-05 15:45:59 +01:00
Martin Kroeker fa03e5497a
Add tests for the DMD functions (Reference-LAPACK PR 736) 2023-12-05 15:43:28 +01:00
Martin Kroeker a53a79e059
Add tests for the DMD functions (Reference-LAPACK PR 736) 2023-12-05 15:41:39 +01:00
Martin Kroeker e3039fa7f6
Merge pull request #4351 from catap/cmake-old-macos
Use 64bit build on `CMAKE_SYSTEM_PROCESSOR=i386` on Darwin
2023-12-05 14:40:18 +01:00
Kirill A. Korinsky 08fde5ebd2
Use 64bit build on `CMAKE_SYSTEM_PROCESSOR=i386` on Darwin
Here a bit tricky things.

A value `CMAKE_SYSTEM_PROCESSOR` is came from output of `uname -m` which
migth be 32bit with 64bit building applicaiton.

So, for that case use `CMAKE_SIZEOF_VOID_P` to detect the target.

See https://trac.macports.org/ticket/68488
2023-11-30 21:24:58 +00:00
Martin Kroeker 39bf8ece20
Merge pull request #4340 from yinshiyou/la-dev
Add some refines and optimizations for LoongArch.
2023-11-29 08:22:25 +01:00
Martin Kroeker 42b5e081d8
Merge pull request #4348 from catap/macos-undefinded-dynamic-lookup
Allow weak linking on old macOS
2023-11-28 22:14:53 +01:00
Kirill A. Korinsky a1562e4bae
Allow weak linking on old macOS 2023-11-28 14:04:01 +00:00
Martin Kroeker c4a622db9e
Merge pull request #4346 from martin-frbg/issue4343
Fix CMAKE installation location of lapacke_mangling header
2023-11-28 14:01:14 +01:00
Shiyou Yin 9fe07d82fd loongarch: Add LSX optimization for dot. 2023-11-28 20:24:18 +08:00
Shiyou Yin 13b8c44b44 loongarch: Add optimization for dsdot kernel. 2023-11-28 20:24:16 +08:00
Shiyou Yin 3def6a8143 loongarch: Add LASX optimization for dot. 2023-11-28 20:24:14 +08:00
Shiyou Yin 1310a0931b loongarch: Refine build control for loongarch64.
1. Use getauxval instead of cpucfg to test hardware capability.
2. Remove unnecessary code and option for compiler check in c_check.
2023-11-28 20:23:55 +08:00
Martin Kroeker ff92e6e707
Fix installation location of lapacke_mangling header 2023-11-28 12:53:35 +01:00
Martin Kroeker b7a28f5e42
Merge pull request #4344 from catap/macos-always-use-ar
Enable overstep of too long args without DYNAMIC_ARCH
2023-11-28 12:39:45 +01:00
Kirill A. Korinsky 9beee55167
Enable overstep of too long args without DYNAMIC_ARCH 2023-11-27 23:41:56 +00:00
Martin Kroeker fc66ecd25a
Merge pull request #4339 from martin-frbg/lapack-3-12-0
Update version number and documentation of Reference-LAPACK to 3.12.0
2023-11-25 23:54:05 +01:00
Martin Kroeker 08be9004f8
Update version number and copyright date to Reference-LAPACK 3.12.0 2023-11-25 18:57:17 +01:00
Martin Kroeker 578f0f9590
Update version number to 3.12.0 2023-11-25 18:53:16 +01:00
Martin Kroeker 3d9e20f614
Update version to 3.12.0 2023-11-25 18:51:54 +01:00
Martin Kroeker f7351e493c
Update Reference-LAPACK docs to 3.12.0 2023-11-25 18:49:34 +01:00
Martin Kroeker be8661ba40
Merge pull request #4338 from martin-frbg/lapack941
Docu fix for Truncated QR With Pivoting (Reference-LAPACK PR 941)
2023-11-25 18:41:25 +01:00
Martin Kroeker ca5a87ff1d
Small documentation fix for Truncated QR With Pivoting (Reference-LAPACK PR 941) 2023-11-25 15:31:18 +01:00
Shiyou Yin f745f02f35 benchmark: Fix missing colons in outputs of ./strsv.goto 2023-11-24 14:55:18 +08:00
Martin Kroeker 97d3c9b827
Merge pull request #4336 from martin-frbg/fix4322
Revert unintentional change to gmake linking rule in LAPACK TESTING/LIN
2023-11-22 22:44:21 +01:00
Martin Kroeker c883abf838
Revert unintentional change to linking rule from PR 4322 2023-11-22 22:41:53 +01:00
Martin Kroeker 8138999cd0
Merge pull request #4333 from codeworm96/update_dynamic_core_readme
Update the list of default dynamic targets for x86_64 in the README to be consistent with the Makefile
2023-11-21 13:50:37 +01:00
Martin Kroeker a938e48fa2
Merge pull request #4334 from RajalakshmiSR/Makefile_power
POWER: Fixing Makefile error
2023-11-21 10:24:25 +01:00
Rajalakshmi Srinivasaraghavan 47da601a2d POWER: Fixing Makefile error
Recent commit d99aad8ee3 added
extra `)`. This patch fixes the warning from Makefile.
2023-11-20 17:24:22 -06:00
Yuning Zhang 54be8f4d67 Update the list of default dynamic targets for x86_64 in the README to be consistent with the Makefile
Signed-off-by: Yuning Zhang <codeworm96@outlook.com>
2023-11-20 13:28:25 -08:00
Martin Kroeker d526c4306f
Merge pull request #4329 from isuruf/sbgemm
Fix building test_sbgemm
2023-11-20 15:30:02 +01:00
Martin Kroeker 2ea65bacd0
Merge pull request #4330 from bartoldeman/asum-init-mask
Use _mm_set1_epi{32,64x} to init mask in x86-64 [cz]asum
2023-11-20 05:38:39 +01:00
Bart Oldeman c34e2cf380 Use _mm_set1_epi{32,64x} to init mask in x86-64 [cz]asum
for skylake kernels. This is the same method as used in [sd]asum.
_mm_set1_epi64x was commented out for zasum, but has the advantage
of avoiding possible undefined behaviour (using an uninitialized
variable), optimized out by NVHPC and icx. The new code works
fine with those compilers.

For GCC 12.3 the generated code is identical; no matter what method
you use, the compiler optimizes the code into a compile-time
constant, there is no performance benefit using mm_cmpeq_epi8
since the corresponding instruction (VPCMPEQB) isn't actually
generated!
2023-11-19 21:28:35 +00:00
Martin Kroeker 864c65b526
Merge pull request #4328 from martin-frbg/4239-3
Copy XCode15-specific workaround for Apple M to Fortran flags to fix build of tests
2023-11-19 22:06:17 +01:00