Commit Graph

7688 Commits

Author SHA1 Message Date
zhoupeng
116aee7527 loongarch64: Refine imin optimization. 2023-12-29 17:30:57 +08:00
zhoupeng
8be2654193 loongarch64: Refine imax optimization. 2023-12-29 17:30:57 +08:00
zhoupeng
154baad454 loongarch64: Refine iamin optimization. 2023-12-29 17:30:57 +08:00
Shiyou Yin
36c12c4971 loongarch64: Refine copy,swap,nrm2,sum optimization. 2023-12-29 17:30:57 +08:00
Shiyou Yin
c6996a80e9 loongarch64: Refine amax,amin,max,min optimization. 2023-12-29 17:30:57 +08:00
Martin Kroeker
3b520a56a9 Merge pull request #4378 from martin-frbg/issue3871
Mention C906V instruction set limitation and update DYNAMIC_ARCH lists
2023-12-15 21:58:56 +01:00
Martin Kroeker
563daadc92 Merge pull request #4379 from barracuda156/ppc970
PPC970: drop -mcpu=970 which seems to produce faulty code
2023-12-15 20:03:44 +01:00
barracuda156
8c143331b0 PPC970: drop -mcpu=970 which seems to produce faulty code
Fixes: https://github.com/OpenMathLib/OpenBLAS/issues/4376
2023-12-15 22:56:06 +08:00
Martin Kroeker
d2f1594bca Merge pull request #4368 from martin-frbg/issue4073
Add complex type definitions for MSVC in Reference-LAPACK's lapack.h
2023-12-15 14:49:52 +01:00
Martin Kroeker
544cb86300 Mention C906V instruction set limitation and update DYNAMIC_ARCH lists 2023-12-15 14:03:59 +01:00
Martin Kroeker
8793601e86 Merge pull request #4375 from martin-frbg/issue4352
Retire the GotoBLAS gemv_t kernel still used as fallback on x86_64
2023-12-15 13:35:18 +01:00
Martin Kroeker
f06b535566 Use C kernel for dgemv_t due to limitations of the old assembly one 2023-12-15 09:58:44 +01:00
Martin Kroeker
293131d6b9 Merge pull request #4370 from barracuda156/unbreak_powerpc
macOS PowerPC: fix CMake build
2023-12-14 10:30:03 +01:00
barracuda156
981e315b30 cc.cmake: use -force_cpusubtype_ALL for Darwin PPC 2023-12-14 12:01:31 +08:00
barracuda156
d9653af018 KERNEL.PPC970, KERNEL.PPCG4: unbreak CMake parsing
Fixes: https://github.com/OpenMathLib/OpenBLAS/issues/4366
2023-12-14 12:00:11 +08:00
Martin Kroeker
302ca7edc7 Merge pull request #4371 from barracuda156/970
cc.cmake: add optflags for G5 and G4 kernels
2023-12-13 14:32:37 +01:00
barracuda156
a8d3619f65 cc.cmake: add optflags for G5 and G4 kernels 2023-12-13 19:42:56 +08:00
Martin Kroeker
aa46f1e4e7 revert addition of MSVC-compatible complex (moved to lapacke_config.h) 2023-12-12 23:07:48 +01:00
Martin Kroeker
dcdc351272 Add MSVC-compatible complex types 2023-12-12 23:06:22 +01:00
Martin Kroeker
55a0718f72 Merge pull request #4369 from ChipKerchner/power10Copies
Replace two vector loads with one vector pair load.
2023-12-12 18:49:21 +01:00
Chip-Kerchner
93747fb377 Merge remote-tracking branch 'origin/develop' into power10Copies 2023-12-12 09:32:49 -06:00
Martin Kroeker
dcf6999c4e remove extraneous endif 2023-12-12 11:27:17 +01:00
Martin Kroeker
330101e0b3 Add complex type definitions for MSVC 2023-12-11 21:52:00 +01:00
Martin Kroeker
d9f1478068 Merge pull request #4367 from barracuda156/unbreak_powerpc
Fix arch detection with CMake build for PowerPC
2023-12-11 21:38:32 +01:00
barracuda156
9dbc8129b3 cpuid_power.c: add CPU_SUBTYPE_POWERPC_7400 case 2023-12-11 21:09:06 +08:00
barracuda156
c732f275a2 system_check.cmake: fix arch detection for Darwin PowerPC 2023-12-11 21:05:31 +08:00
Martin Kroeker
e60fb0f397 Merge pull request #4359 from mseminatore/win_perf
Improve Windows threading performance scaling
2023-12-09 23:40:26 +01:00
Mark Seminatore
efa9515a23 Merge branch 'OpenMathLib:develop' into win_perf 2023-12-09 10:09:49 -08:00
Chip-Kerchner
4e738e561a Replace two vector loads with one vector pair load and fix endianess of stores. 2023-12-08 12:36:08 -06:00
Mark Seminatore
edac80d7e8 some cleanup, dynamically scale threads, add missing WIN_CASE defn 2023-12-07 14:59:27 -08:00
Martin Kroeker
5b09833b1c Merge pull request #4019 from uniontech-lilinjie/develop
fix typo
2023-12-07 14:46:17 +01:00
Martin Kroeker
3193aa9c7e Merge pull request #4362 from yinshiyou/la-dev
Add 15 level1 optimizations for LoongArch.
2023-12-07 09:15:15 +01:00
yancheng
d32f38fb37 loongarch64: Add optimizations for nrm2. 2023-12-07 14:36:26 +08:00
yancheng
f9b468990e loongarch64: Add optimizations for rot. 2023-12-07 14:36:26 +08:00
yancheng
c80e7e27d1 loongarch64: Add optimizations for sum and asum. 2023-12-07 14:36:26 +08:00
yancheng
d4c96a35a8 loongarch64: Add optimizations for axpy and axpby. 2023-12-07 14:36:26 +08:00
yancheng
360acc0a41 loongarch64: Add optimizations for swap. 2023-12-07 14:36:26 +08:00
yancheng
174c25766b loongarch64: Add optimizations for copy. 2023-12-07 14:36:26 +08:00
yancheng
49829b2b7d loongarch64: Add optimizations for iamin. 2023-12-07 14:36:07 +08:00
yancheng
be83f5e4e0 loongarch64: Add optimizations for iamax. 2023-12-07 14:36:07 +08:00
yancheng
e3fb2b5afa loongarch64: Add optimizations for imin. 2023-12-07 14:36:07 +08:00
yancheng
e46b48e372 loongarch64: Add optimizations for imax. 2023-12-07 14:36:07 +08:00
yancheng
702fc1d56d loongarch64: Add optimization for min. 2023-12-07 14:36:07 +08:00
yancheng
346b384d1c loongarch64: Add optimization for max. 2023-12-07 14:36:07 +08:00
yancheng
ff2ecc6cda loongarch64: Add optimization for amin. 2023-12-07 14:36:07 +08:00
yancheng
265b5f2e80 loongarch64: Add optimizations for amax. 2023-12-07 14:36:07 +08:00
yancheng
993ede7c70 loongarch64: Add optimizations for scal. 2023-12-07 14:36:07 +08:00
Mark Seminatore
4ebf814b42 fix bug failing to mark task as finished. 2023-12-05 23:28:37 -08:00
Mark Seminatore
5f51811728 try at new threading model 2023-12-05 22:43:36 -08:00
Martin Kroeker
a8cb611157 Merge pull request #4358 from martin-frbg/lapack954
Fix keyword used to count successful tests (Reference-LAPACK PR 954)
2023-12-05 22:20:15 +01:00