Commit Graph

8616 Commits

Author SHA1 Message Date
Martin Kroeker f06b535566
Use C kernel for dgemv_t due to limitations of the old assembly one 2023-12-15 09:58:44 +01:00
Martin Kroeker 293131d6b9
Merge pull request #4370 from barracuda156/unbreak_powerpc
macOS PowerPC: fix CMake build
2023-12-14 10:30:03 +01:00
barracuda156 981e315b30 cc.cmake: use -force_cpusubtype_ALL for Darwin PPC 2023-12-14 12:01:31 +08:00
barracuda156 d9653af018 KERNEL.PPC970, KERNEL.PPCG4: unbreak CMake parsing
Fixes: https://github.com/OpenMathLib/OpenBLAS/issues/4366
2023-12-14 12:00:11 +08:00
Martin Kroeker 302ca7edc7
Merge pull request #4371 from barracuda156/970
cc.cmake: add optflags for G5 and G4 kernels
2023-12-13 14:32:37 +01:00
barracuda156 a8d3619f65 cc.cmake: add optflags for G5 and G4 kernels 2023-12-13 19:42:56 +08:00
Martin Kroeker aa46f1e4e7
revert addition of MSVC-compatible complex (moved to lapacke_config.h) 2023-12-12 23:07:48 +01:00
Martin Kroeker dcdc351272
Add MSVC-compatible complex types 2023-12-12 23:06:22 +01:00
Martin Kroeker 55a0718f72
Merge pull request #4369 from ChipKerchner/power10Copies
Replace two vector loads with one vector pair load.
2023-12-12 18:49:21 +01:00
Chip-Kerchner 93747fb377 Merge remote-tracking branch 'origin/develop' into power10Copies 2023-12-12 09:32:49 -06:00
Martin Kroeker dcf6999c4e
remove extraneous endif 2023-12-12 11:27:17 +01:00
Mark Seminatore 6bd7c54af5 introduce MT_TRACE to clean up SMP_DEBUG code 2023-12-11 15:13:04 -08:00
Martin Kroeker 330101e0b3
Add complex type definitions for MSVC 2023-12-11 21:52:00 +01:00
Martin Kroeker d9f1478068
Merge pull request #4367 from barracuda156/unbreak_powerpc
Fix arch detection with CMake build for PowerPC
2023-12-11 21:38:32 +01:00
barracuda156 9dbc8129b3 cpuid_power.c: add CPU_SUBTYPE_POWERPC_7400 case 2023-12-11 21:09:06 +08:00
barracuda156 c732f275a2 system_check.cmake: fix arch detection for Darwin PowerPC 2023-12-11 21:05:31 +08:00
Martin Kroeker e60fb0f397
Merge pull request #4359 from mseminatore/win_perf
Improve Windows threading performance scaling
2023-12-09 23:40:26 +01:00
Mark Seminatore efa9515a23
Merge branch 'OpenMathLib:develop' into win_perf 2023-12-09 10:09:49 -08:00
Chip-Kerchner 4e738e561a Replace two vector loads with one vector pair load and fix endianess of stores. 2023-12-08 12:36:08 -06:00
Martin Kroeker 1332f8a822
Merge pull request #4159 from OMaghiarIMG/risc-v-tail-policy
Set tail policy to undisturbed for RVV intrinsics accumulators
2023-12-08 10:25:41 +01:00
Mark Seminatore edac80d7e8 some cleanup, dynamically scale threads, add missing WIN_CASE defn 2023-12-07 14:59:27 -08:00
Martin Kroeker 2d316c2920
Merge pull request #4125 from OMaghiarIMG/risc-v
Fixes RVV masked intrinsics for iamax/iamin/imax/imin kernels
2023-12-07 14:50:58 +01:00
Martin Kroeker 5b09833b1c
Merge pull request #4019 from uniontech-lilinjie/develop
fix typo
2023-12-07 14:46:17 +01:00
Martin Kroeker 3193aa9c7e
Merge pull request #4362 from yinshiyou/la-dev
Add 15 level1 optimizations for LoongArch.
2023-12-07 09:15:15 +01:00
yancheng d32f38fb37 loongarch64: Add optimizations for nrm2. 2023-12-07 14:36:26 +08:00
yancheng f9b468990e loongarch64: Add optimizations for rot. 2023-12-07 14:36:26 +08:00
yancheng c80e7e27d1 loongarch64: Add optimizations for sum and asum. 2023-12-07 14:36:26 +08:00
yancheng d4c96a35a8 loongarch64: Add optimizations for axpy and axpby. 2023-12-07 14:36:26 +08:00
yancheng 360acc0a41 loongarch64: Add optimizations for swap. 2023-12-07 14:36:26 +08:00
yancheng 174c25766b loongarch64: Add optimizations for copy. 2023-12-07 14:36:26 +08:00
yancheng 49829b2b7d loongarch64: Add optimizations for iamin. 2023-12-07 14:36:07 +08:00
yancheng be83f5e4e0 loongarch64: Add optimizations for iamax. 2023-12-07 14:36:07 +08:00
yancheng e3fb2b5afa loongarch64: Add optimizations for imin. 2023-12-07 14:36:07 +08:00
yancheng e46b48e372 loongarch64: Add optimizations for imax. 2023-12-07 14:36:07 +08:00
yancheng 702fc1d56d loongarch64: Add optimization for min. 2023-12-07 14:36:07 +08:00
yancheng 346b384d1c loongarch64: Add optimization for max. 2023-12-07 14:36:07 +08:00
yancheng ff2ecc6cda loongarch64: Add optimization for amin. 2023-12-07 14:36:07 +08:00
yancheng 265b5f2e80 loongarch64: Add optimizations for amax. 2023-12-07 14:36:07 +08:00
yancheng 993ede7c70 loongarch64: Add optimizations for scal. 2023-12-07 14:36:07 +08:00
Mark Seminatore 4ebf814b42 fix bug failing to mark task as finished. 2023-12-05 23:28:37 -08:00
Mark Seminatore 5f51811728 try at new threading model 2023-12-05 22:43:36 -08:00
Martin Kroeker a8cb611157
Merge pull request #4358 from martin-frbg/lapack954
Fix keyword used to count successful tests (Reference-LAPACK PR 954)
2023-12-05 22:20:15 +01:00
Martin Kroeker 589f2b6466
Fix search phrase used to count successful tests (Reference-LAPACK PR 954) 2023-12-05 20:10:20 +01:00
Martin Kroeker 6aa5f53e26
Merge pull request #4357 from martin-frbg/lapack953
Fix memory leak in LAPACK testing framework (Reference-LAPACK PR 953)
2023-12-05 20:03:21 +01:00
Martin Kroeker effb7af2a2
Fix memory leak (Reference-LAPACK PR 953) 2023-12-05 17:55:38 +01:00
Martin Kroeker 5915a69734
Merge pull request #4356 from martin-frbg/lapack736-2
Add LAPACK tests for the Dynamic Mode Decomposition functions from Reference-LAPACK PR 736
2023-12-05 17:48:42 +01:00
Martin Kroeker 226a14c549
Restore library path adjustments 2023-12-05 15:50:06 +01:00
Martin Kroeker c5fa318add
Add tests for DMD (Reference-LAPACK PR 736) 2023-12-05 15:45:59 +01:00
Martin Kroeker fa03e5497a
Add tests for the DMD functions (Reference-LAPACK PR 736) 2023-12-05 15:43:28 +01:00
Martin Kroeker a53a79e059
Add tests for the DMD functions (Reference-LAPACK PR 736) 2023-12-05 15:41:39 +01:00