Martin Kroeker
|
f06b535566
|
Use C kernel for dgemv_t due to limitations of the old assembly one
|
2023-12-15 09:58:44 +01:00 |
Martin Kroeker
|
293131d6b9
|
Merge pull request #4370 from barracuda156/unbreak_powerpc
macOS PowerPC: fix CMake build
|
2023-12-14 10:30:03 +01:00 |
barracuda156
|
981e315b30
|
cc.cmake: use -force_cpusubtype_ALL for Darwin PPC
|
2023-12-14 12:01:31 +08:00 |
barracuda156
|
d9653af018
|
KERNEL.PPC970, KERNEL.PPCG4: unbreak CMake parsing
Fixes: https://github.com/OpenMathLib/OpenBLAS/issues/4366
|
2023-12-14 12:00:11 +08:00 |
Martin Kroeker
|
302ca7edc7
|
Merge pull request #4371 from barracuda156/970
cc.cmake: add optflags for G5 and G4 kernels
|
2023-12-13 14:32:37 +01:00 |
barracuda156
|
a8d3619f65
|
cc.cmake: add optflags for G5 and G4 kernels
|
2023-12-13 19:42:56 +08:00 |
Martin Kroeker
|
aa46f1e4e7
|
revert addition of MSVC-compatible complex (moved to lapacke_config.h)
|
2023-12-12 23:07:48 +01:00 |
Martin Kroeker
|
dcdc351272
|
Add MSVC-compatible complex types
|
2023-12-12 23:06:22 +01:00 |
Martin Kroeker
|
55a0718f72
|
Merge pull request #4369 from ChipKerchner/power10Copies
Replace two vector loads with one vector pair load.
|
2023-12-12 18:49:21 +01:00 |
Chip-Kerchner
|
93747fb377
|
Merge remote-tracking branch 'origin/develop' into power10Copies
|
2023-12-12 09:32:49 -06:00 |
Martin Kroeker
|
dcf6999c4e
|
remove extraneous endif
|
2023-12-12 11:27:17 +01:00 |
Mark Seminatore
|
6bd7c54af5
|
introduce MT_TRACE to clean up SMP_DEBUG code
|
2023-12-11 15:13:04 -08:00 |
Martin Kroeker
|
330101e0b3
|
Add complex type definitions for MSVC
|
2023-12-11 21:52:00 +01:00 |
Martin Kroeker
|
d9f1478068
|
Merge pull request #4367 from barracuda156/unbreak_powerpc
Fix arch detection with CMake build for PowerPC
|
2023-12-11 21:38:32 +01:00 |
barracuda156
|
9dbc8129b3
|
cpuid_power.c: add CPU_SUBTYPE_POWERPC_7400 case
|
2023-12-11 21:09:06 +08:00 |
barracuda156
|
c732f275a2
|
system_check.cmake: fix arch detection for Darwin PowerPC
|
2023-12-11 21:05:31 +08:00 |
Martin Kroeker
|
e60fb0f397
|
Merge pull request #4359 from mseminatore/win_perf
Improve Windows threading performance scaling
|
2023-12-09 23:40:26 +01:00 |
Mark Seminatore
|
efa9515a23
|
Merge branch 'OpenMathLib:develop' into win_perf
|
2023-12-09 10:09:49 -08:00 |
Chip-Kerchner
|
4e738e561a
|
Replace two vector loads with one vector pair load and fix endianess of stores.
|
2023-12-08 12:36:08 -06:00 |
Martin Kroeker
|
1332f8a822
|
Merge pull request #4159 from OMaghiarIMG/risc-v-tail-policy
Set tail policy to undisturbed for RVV intrinsics accumulators
|
2023-12-08 10:25:41 +01:00 |
Mark Seminatore
|
edac80d7e8
|
some cleanup, dynamically scale threads, add missing WIN_CASE defn
|
2023-12-07 14:59:27 -08:00 |
Martin Kroeker
|
2d316c2920
|
Merge pull request #4125 from OMaghiarIMG/risc-v
Fixes RVV masked intrinsics for iamax/iamin/imax/imin kernels
|
2023-12-07 14:50:58 +01:00 |
Martin Kroeker
|
5b09833b1c
|
Merge pull request #4019 from uniontech-lilinjie/develop
fix typo
|
2023-12-07 14:46:17 +01:00 |
Martin Kroeker
|
3193aa9c7e
|
Merge pull request #4362 from yinshiyou/la-dev
Add 15 level1 optimizations for LoongArch.
|
2023-12-07 09:15:15 +01:00 |
yancheng
|
d32f38fb37
|
loongarch64: Add optimizations for nrm2.
|
2023-12-07 14:36:26 +08:00 |
yancheng
|
f9b468990e
|
loongarch64: Add optimizations for rot.
|
2023-12-07 14:36:26 +08:00 |
yancheng
|
c80e7e27d1
|
loongarch64: Add optimizations for sum and asum.
|
2023-12-07 14:36:26 +08:00 |
yancheng
|
d4c96a35a8
|
loongarch64: Add optimizations for axpy and axpby.
|
2023-12-07 14:36:26 +08:00 |
yancheng
|
360acc0a41
|
loongarch64: Add optimizations for swap.
|
2023-12-07 14:36:26 +08:00 |
yancheng
|
174c25766b
|
loongarch64: Add optimizations for copy.
|
2023-12-07 14:36:26 +08:00 |
yancheng
|
49829b2b7d
|
loongarch64: Add optimizations for iamin.
|
2023-12-07 14:36:07 +08:00 |
yancheng
|
be83f5e4e0
|
loongarch64: Add optimizations for iamax.
|
2023-12-07 14:36:07 +08:00 |
yancheng
|
e3fb2b5afa
|
loongarch64: Add optimizations for imin.
|
2023-12-07 14:36:07 +08:00 |
yancheng
|
e46b48e372
|
loongarch64: Add optimizations for imax.
|
2023-12-07 14:36:07 +08:00 |
yancheng
|
702fc1d56d
|
loongarch64: Add optimization for min.
|
2023-12-07 14:36:07 +08:00 |
yancheng
|
346b384d1c
|
loongarch64: Add optimization for max.
|
2023-12-07 14:36:07 +08:00 |
yancheng
|
ff2ecc6cda
|
loongarch64: Add optimization for amin.
|
2023-12-07 14:36:07 +08:00 |
yancheng
|
265b5f2e80
|
loongarch64: Add optimizations for amax.
|
2023-12-07 14:36:07 +08:00 |
yancheng
|
993ede7c70
|
loongarch64: Add optimizations for scal.
|
2023-12-07 14:36:07 +08:00 |
Mark Seminatore
|
4ebf814b42
|
fix bug failing to mark task as finished.
|
2023-12-05 23:28:37 -08:00 |
Mark Seminatore
|
5f51811728
|
try at new threading model
|
2023-12-05 22:43:36 -08:00 |
Martin Kroeker
|
a8cb611157
|
Merge pull request #4358 from martin-frbg/lapack954
Fix keyword used to count successful tests (Reference-LAPACK PR 954)
|
2023-12-05 22:20:15 +01:00 |
Martin Kroeker
|
589f2b6466
|
Fix search phrase used to count successful tests (Reference-LAPACK PR 954)
|
2023-12-05 20:10:20 +01:00 |
Martin Kroeker
|
6aa5f53e26
|
Merge pull request #4357 from martin-frbg/lapack953
Fix memory leak in LAPACK testing framework (Reference-LAPACK PR 953)
|
2023-12-05 20:03:21 +01:00 |
Martin Kroeker
|
effb7af2a2
|
Fix memory leak (Reference-LAPACK PR 953)
|
2023-12-05 17:55:38 +01:00 |
Martin Kroeker
|
5915a69734
|
Merge pull request #4356 from martin-frbg/lapack736-2
Add LAPACK tests for the Dynamic Mode Decomposition functions from Reference-LAPACK PR 736
|
2023-12-05 17:48:42 +01:00 |
Martin Kroeker
|
226a14c549
|
Restore library path adjustments
|
2023-12-05 15:50:06 +01:00 |
Martin Kroeker
|
c5fa318add
|
Add tests for DMD (Reference-LAPACK PR 736)
|
2023-12-05 15:45:59 +01:00 |
Martin Kroeker
|
fa03e5497a
|
Add tests for the DMD functions (Reference-LAPACK PR 736)
|
2023-12-05 15:43:28 +01:00 |
Martin Kroeker
|
a53a79e059
|
Add tests for the DMD functions (Reference-LAPACK PR 736)
|
2023-12-05 15:41:39 +01:00 |