barracuda156
d9653af018
KERNEL.PPC970, KERNEL.PPCG4: unbreak CMake parsing
...
Fixes: https://github.com/OpenMathLib/OpenBLAS/issues/4366
2023-12-14 12:00:11 +08:00
Martin Kroeker
302ca7edc7
Merge pull request #4371 from barracuda156/970
...
cc.cmake: add optflags for G5 and G4 kernels
2023-12-13 14:32:37 +01:00
barracuda156
a8d3619f65
cc.cmake: add optflags for G5 and G4 kernels
2023-12-13 19:42:56 +08:00
Martin Kroeker
aa46f1e4e7
revert addition of MSVC-compatible complex (moved to lapacke_config.h)
2023-12-12 23:07:48 +01:00
Martin Kroeker
dcdc351272
Add MSVC-compatible complex types
2023-12-12 23:06:22 +01:00
Martin Kroeker
55a0718f72
Merge pull request #4369 from ChipKerchner/power10Copies
...
Replace two vector loads with one vector pair load.
2023-12-12 18:49:21 +01:00
Chip-Kerchner
93747fb377
Merge remote-tracking branch 'origin/develop' into power10Copies
2023-12-12 09:32:49 -06:00
Martin Kroeker
dcf6999c4e
remove extraneous endif
2023-12-12 11:27:17 +01:00
Martin Kroeker
330101e0b3
Add complex type definitions for MSVC
2023-12-11 21:52:00 +01:00
Martin Kroeker
d9f1478068
Merge pull request #4367 from barracuda156/unbreak_powerpc
...
Fix arch detection with CMake build for PowerPC
2023-12-11 21:38:32 +01:00
barracuda156
9dbc8129b3
cpuid_power.c: add CPU_SUBTYPE_POWERPC_7400 case
2023-12-11 21:09:06 +08:00
barracuda156
c732f275a2
system_check.cmake: fix arch detection for Darwin PowerPC
2023-12-11 21:05:31 +08:00
Martin Kroeker
e60fb0f397
Merge pull request #4359 from mseminatore/win_perf
...
Improve Windows threading performance scaling
2023-12-09 23:40:26 +01:00
Mark Seminatore
efa9515a23
Merge branch 'OpenMathLib:develop' into win_perf
2023-12-09 10:09:49 -08:00
Chip-Kerchner
4e738e561a
Replace two vector loads with one vector pair load and fix endianess of stores.
2023-12-08 12:36:08 -06:00
Martin Kroeker
1332f8a822
Merge pull request #4159 from OMaghiarIMG/risc-v-tail-policy
...
Set tail policy to undisturbed for RVV intrinsics accumulators
2023-12-08 10:25:41 +01:00
Mark Seminatore
edac80d7e8
some cleanup, dynamically scale threads, add missing WIN_CASE defn
2023-12-07 14:59:27 -08:00
Martin Kroeker
2d316c2920
Merge pull request #4125 from OMaghiarIMG/risc-v
...
Fixes RVV masked intrinsics for iamax/iamin/imax/imin kernels
2023-12-07 14:50:58 +01:00
Martin Kroeker
5b09833b1c
Merge pull request #4019 from uniontech-lilinjie/develop
...
fix typo
2023-12-07 14:46:17 +01:00
Martin Kroeker
3193aa9c7e
Merge pull request #4362 from yinshiyou/la-dev
...
Add 15 level1 optimizations for LoongArch.
2023-12-07 09:15:15 +01:00
yancheng
d32f38fb37
loongarch64: Add optimizations for nrm2.
2023-12-07 14:36:26 +08:00
yancheng
f9b468990e
loongarch64: Add optimizations for rot.
2023-12-07 14:36:26 +08:00
yancheng
c80e7e27d1
loongarch64: Add optimizations for sum and asum.
2023-12-07 14:36:26 +08:00
yancheng
d4c96a35a8
loongarch64: Add optimizations for axpy and axpby.
2023-12-07 14:36:26 +08:00
yancheng
360acc0a41
loongarch64: Add optimizations for swap.
2023-12-07 14:36:26 +08:00
yancheng
174c25766b
loongarch64: Add optimizations for copy.
2023-12-07 14:36:26 +08:00
yancheng
49829b2b7d
loongarch64: Add optimizations for iamin.
2023-12-07 14:36:07 +08:00
yancheng
be83f5e4e0
loongarch64: Add optimizations for iamax.
2023-12-07 14:36:07 +08:00
yancheng
e3fb2b5afa
loongarch64: Add optimizations for imin.
2023-12-07 14:36:07 +08:00
yancheng
e46b48e372
loongarch64: Add optimizations for imax.
2023-12-07 14:36:07 +08:00
yancheng
702fc1d56d
loongarch64: Add optimization for min.
2023-12-07 14:36:07 +08:00
yancheng
346b384d1c
loongarch64: Add optimization for max.
2023-12-07 14:36:07 +08:00
yancheng
ff2ecc6cda
loongarch64: Add optimization for amin.
2023-12-07 14:36:07 +08:00
yancheng
265b5f2e80
loongarch64: Add optimizations for amax.
2023-12-07 14:36:07 +08:00
yancheng
993ede7c70
loongarch64: Add optimizations for scal.
2023-12-07 14:36:07 +08:00
Mark Seminatore
4ebf814b42
fix bug failing to mark task as finished.
2023-12-05 23:28:37 -08:00
Mark Seminatore
5f51811728
try at new threading model
2023-12-05 22:43:36 -08:00
Martin Kroeker
a8cb611157
Merge pull request #4358 from martin-frbg/lapack954
...
Fix keyword used to count successful tests (Reference-LAPACK PR 954)
2023-12-05 22:20:15 +01:00
Martin Kroeker
589f2b6466
Fix search phrase used to count successful tests (Reference-LAPACK PR 954)
2023-12-05 20:10:20 +01:00
Martin Kroeker
6aa5f53e26
Merge pull request #4357 from martin-frbg/lapack953
...
Fix memory leak in LAPACK testing framework (Reference-LAPACK PR 953)
2023-12-05 20:03:21 +01:00
Martin Kroeker
effb7af2a2
Fix memory leak (Reference-LAPACK PR 953)
2023-12-05 17:55:38 +01:00
Martin Kroeker
5915a69734
Merge pull request #4356 from martin-frbg/lapack736-2
...
Add LAPACK tests for the Dynamic Mode Decomposition functions from Reference-LAPACK PR 736
2023-12-05 17:48:42 +01:00
Martin Kroeker
226a14c549
Restore library path adjustments
2023-12-05 15:50:06 +01:00
Martin Kroeker
c5fa318add
Add tests for DMD (Reference-LAPACK PR 736)
2023-12-05 15:45:59 +01:00
Martin Kroeker
fa03e5497a
Add tests for the DMD functions (Reference-LAPACK PR 736)
2023-12-05 15:43:28 +01:00
Martin Kroeker
a53a79e059
Add tests for the DMD functions (Reference-LAPACK PR 736)
2023-12-05 15:41:39 +01:00
Martin Kroeker
e3039fa7f6
Merge pull request #4351 from catap/cmake-old-macos
...
Use 64bit build on `CMAKE_SYSTEM_PROCESSOR=i386` on Darwin
2023-12-05 14:40:18 +01:00
Octavian Maghiar
4a12cf53ec
[RISC-V] Improve RVV kernel generator LMUL usage
...
The RVV kernel generation script uses the provided LMUL to increase the number of accumulator registers.
Since the effect of the LMUL is to group together the vector registers into larger ones, it actually should be used as a multiplier in the calculation of vlenmax.
At the moment, no matter what LMUL is provided, the generated kernels would only set the maximum number of vector elements equal to VLEN/SEW.
Commit changes the use of LMUL to properly adjust vlenmax. Note that an increase in LMUL results in a decrease in the number of effective vector registers.
2023-12-04 11:13:35 +00:00
Octavian Maghiar
e4586e81b8
[RISC-V] Add RISC-V Vector 128-bit target
...
Current RVV x280 target depends on vlen=512-bits for Level 3 operations.
Commit adds generic target that supports vlen=128-bits.
New target uses the same scalable kernels as x280 for Level 1&2 operations, and autogenerated kernels for Level 3 operations.
Functional correctness of Level 3 operations tested on vlen=128-bits using QEMU v8.1.1 for ctests and BLAS-Tester.
2023-12-04 11:02:18 +00:00
Erik Bråthen Solem
2381132ada
Darwin < 20: always write xerbla.c.o into archive
...
Write xerbla.c.o into archive regardless of timestamp by using ar -rs
instead of ar -ru.
2023-12-03 19:13:56 +01:00