Commit Graph

7960 Commits

Author SHA1 Message Date
barracuda156 a8d3619f65 cc.cmake: add optflags for G5 and G4 kernels 2023-12-13 19:42:56 +08:00
Martin Kroeker aa46f1e4e7
revert addition of MSVC-compatible complex (moved to lapacke_config.h) 2023-12-12 23:07:48 +01:00
Martin Kroeker dcdc351272
Add MSVC-compatible complex types 2023-12-12 23:06:22 +01:00
Martin Kroeker 55a0718f72
Merge pull request #4369 from ChipKerchner/power10Copies
Replace two vector loads with one vector pair load.
2023-12-12 18:49:21 +01:00
Chip-Kerchner 93747fb377 Merge remote-tracking branch 'origin/develop' into power10Copies 2023-12-12 09:32:49 -06:00
Martin Kroeker dcf6999c4e
remove extraneous endif 2023-12-12 11:27:17 +01:00
Martin Kroeker 330101e0b3
Add complex type definitions for MSVC 2023-12-11 21:52:00 +01:00
Martin Kroeker d9f1478068
Merge pull request #4367 from barracuda156/unbreak_powerpc
Fix arch detection with CMake build for PowerPC
2023-12-11 21:38:32 +01:00
barracuda156 9dbc8129b3 cpuid_power.c: add CPU_SUBTYPE_POWERPC_7400 case 2023-12-11 21:09:06 +08:00
barracuda156 c732f275a2 system_check.cmake: fix arch detection for Darwin PowerPC 2023-12-11 21:05:31 +08:00
Martin Kroeker e60fb0f397
Merge pull request #4359 from mseminatore/win_perf
Improve Windows threading performance scaling
2023-12-09 23:40:26 +01:00
Mark Seminatore efa9515a23
Merge branch 'OpenMathLib:develop' into win_perf 2023-12-09 10:09:49 -08:00
Chip-Kerchner 4e738e561a Replace two vector loads with one vector pair load and fix endianess of stores. 2023-12-08 12:36:08 -06:00
Martin Kroeker 1332f8a822
Merge pull request #4159 from OMaghiarIMG/risc-v-tail-policy
Set tail policy to undisturbed for RVV intrinsics accumulators
2023-12-08 10:25:41 +01:00
Mark Seminatore edac80d7e8 some cleanup, dynamically scale threads, add missing WIN_CASE defn 2023-12-07 14:59:27 -08:00
Martin Kroeker 2d316c2920
Merge pull request #4125 from OMaghiarIMG/risc-v
Fixes RVV masked intrinsics for iamax/iamin/imax/imin kernels
2023-12-07 14:50:58 +01:00
Martin Kroeker 5b09833b1c
Merge pull request #4019 from uniontech-lilinjie/develop
fix typo
2023-12-07 14:46:17 +01:00
Martin Kroeker 3193aa9c7e
Merge pull request #4362 from yinshiyou/la-dev
Add 15 level1 optimizations for LoongArch.
2023-12-07 09:15:15 +01:00
yancheng d32f38fb37 loongarch64: Add optimizations for nrm2. 2023-12-07 14:36:26 +08:00
yancheng f9b468990e loongarch64: Add optimizations for rot. 2023-12-07 14:36:26 +08:00
yancheng c80e7e27d1 loongarch64: Add optimizations for sum and asum. 2023-12-07 14:36:26 +08:00
yancheng d4c96a35a8 loongarch64: Add optimizations for axpy and axpby. 2023-12-07 14:36:26 +08:00
yancheng 360acc0a41 loongarch64: Add optimizations for swap. 2023-12-07 14:36:26 +08:00
yancheng 174c25766b loongarch64: Add optimizations for copy. 2023-12-07 14:36:26 +08:00
yancheng 49829b2b7d loongarch64: Add optimizations for iamin. 2023-12-07 14:36:07 +08:00
yancheng be83f5e4e0 loongarch64: Add optimizations for iamax. 2023-12-07 14:36:07 +08:00
yancheng e3fb2b5afa loongarch64: Add optimizations for imin. 2023-12-07 14:36:07 +08:00
yancheng e46b48e372 loongarch64: Add optimizations for imax. 2023-12-07 14:36:07 +08:00
yancheng 702fc1d56d loongarch64: Add optimization for min. 2023-12-07 14:36:07 +08:00
yancheng 346b384d1c loongarch64: Add optimization for max. 2023-12-07 14:36:07 +08:00
yancheng ff2ecc6cda loongarch64: Add optimization for amin. 2023-12-07 14:36:07 +08:00
yancheng 265b5f2e80 loongarch64: Add optimizations for amax. 2023-12-07 14:36:07 +08:00
yancheng 993ede7c70 loongarch64: Add optimizations for scal. 2023-12-07 14:36:07 +08:00
Mark Seminatore 4ebf814b42 fix bug failing to mark task as finished. 2023-12-05 23:28:37 -08:00
Mark Seminatore 5f51811728 try at new threading model 2023-12-05 22:43:36 -08:00
Martin Kroeker a8cb611157
Merge pull request #4358 from martin-frbg/lapack954
Fix keyword used to count successful tests (Reference-LAPACK PR 954)
2023-12-05 22:20:15 +01:00
Martin Kroeker 589f2b6466
Fix search phrase used to count successful tests (Reference-LAPACK PR 954) 2023-12-05 20:10:20 +01:00
Martin Kroeker 6aa5f53e26
Merge pull request #4357 from martin-frbg/lapack953
Fix memory leak in LAPACK testing framework (Reference-LAPACK PR 953)
2023-12-05 20:03:21 +01:00
Martin Kroeker effb7af2a2
Fix memory leak (Reference-LAPACK PR 953) 2023-12-05 17:55:38 +01:00
Martin Kroeker 5915a69734
Merge pull request #4356 from martin-frbg/lapack736-2
Add LAPACK tests for the Dynamic Mode Decomposition functions from Reference-LAPACK PR 736
2023-12-05 17:48:42 +01:00
Martin Kroeker 226a14c549
Restore library path adjustments 2023-12-05 15:50:06 +01:00
Martin Kroeker c5fa318add
Add tests for DMD (Reference-LAPACK PR 736) 2023-12-05 15:45:59 +01:00
Martin Kroeker fa03e5497a
Add tests for the DMD functions (Reference-LAPACK PR 736) 2023-12-05 15:43:28 +01:00
Martin Kroeker a53a79e059
Add tests for the DMD functions (Reference-LAPACK PR 736) 2023-12-05 15:41:39 +01:00
Martin Kroeker e3039fa7f6
Merge pull request #4351 from catap/cmake-old-macos
Use 64bit build on `CMAKE_SYSTEM_PROCESSOR=i386` on Darwin
2023-12-05 14:40:18 +01:00
Octavian Maghiar 4a12cf53ec [RISC-V] Improve RVV kernel generator LMUL usage
The RVV kernel generation script uses the provided LMUL to increase the number of accumulator registers.
Since the effect of the LMUL is to group together the vector registers into larger ones, it actually should be used as a multiplier in the calculation of vlenmax.
At the moment, no matter what LMUL is provided, the generated kernels would only set the maximum number of vector elements equal to VLEN/SEW.
Commit changes the use of LMUL to properly adjust vlenmax. Note that an increase in LMUL results in a decrease in the number of effective vector registers.
2023-12-04 11:13:35 +00:00
Octavian Maghiar e4586e81b8 [RISC-V] Add RISC-V Vector 128-bit target
Current RVV x280 target depends on vlen=512-bits for Level 3 operations.
Commit adds generic target that supports vlen=128-bits.
New target uses the same scalable kernels as x280 for Level 1&2 operations, and autogenerated kernels for Level 3 operations.
Functional correctness of Level 3 operations tested on vlen=128-bits using QEMU v8.1.1 for ctests and BLAS-Tester.
2023-12-04 11:02:18 +00:00
Erik Bråthen Solem 2381132ada Darwin < 20: always write xerbla.c.o into archive
Write xerbla.c.o into archive regardless of timestamp by using ar -rs
instead of ar -ru.
2023-12-03 19:13:56 +01:00
Erik Bråthen Solem 89fa51d495 Revert 42b5e08 ("Allow weak linking on old macOS") 2023-12-03 19:06:49 +01:00
Kirill A. Korinsky 08fde5ebd2
Use 64bit build on `CMAKE_SYSTEM_PROCESSOR=i386` on Darwin
Here a bit tricky things.

A value `CMAKE_SYSTEM_PROCESSOR` is came from output of `uname -m` which
migth be 32bit with 64bit building applicaiton.

So, for that case use `CMAKE_SIZEOF_VOID_P` to detect the target.

See https://trac.macports.org/ticket/68488
2023-11-30 21:24:58 +00:00