Wu Xiaotian
0baf462dbc
Fix: build failed on LoongArch
...
According to the documentation at https://github.com/loongson/la-abi-specs/blob/release/lapcs.adoc#the-base-abi-variants , valid -mabi parameters are lp64s, lp64f, lp64d, ilp32s, ilp32f and ilp32d.
2023-12-25 16:04:43 +08:00
Martin Kroeker
63a83939a1
Merge pull request #4390 from Mousius/reduce-kernel-duplication
...
Reduce duplication in kernel definitions
2023-12-24 18:04:26 +01:00
Martin Kroeker
dba404055d
Merge pull request #4392 from martin-frbg/lapack959
...
Fix issues related to the ?GEDMD functions (Reference-LAPACK PR 959)
2023-12-24 10:44:15 +01:00
Martin Kroeker
c6fa921027
Add tests for ?GEDMD (Reference-LAPACK PR 959)
2023-12-23 23:39:53 +01:00
Martin Kroeker
283713e4c5
Add tests for ?GEDMD (Reference-LAPACK PR 959)
2023-12-23 23:32:45 +01:00
Martin Kroeker
201f22f49a
Fix issues related to ?GEDMD (Reference-LAPACK PR 959)
2023-12-23 23:27:38 +01:00
Martin Kroeker
05dde8ef04
Merge pull request #4391 from martin-frbg/lapack942
...
Handle corner cases of LWORK (Reference-LAPACK PR 942)
2023-12-23 23:11:46 +01:00
Martin Kroeker
45ef0d7361
Handle corner cases of LWORK (Reference-LAPACK PR 942)
2023-12-23 20:16:33 +01:00
Martin Kroeker
c082669ad4
Handle corner cases of LWORK (Reference-LAPACK PR 942)
2023-12-23 20:05:03 +01:00
Martin Kroeker
29d6024ec5
Handle corner cases of LWORK (Reference-LAPACK PR 942)
2023-12-23 19:44:11 +01:00
Martin Kroeker
0814491d96
Handle corner cases of LWORK (Reference-LAPACK PR 942)
2023-12-23 19:37:03 +01:00
Martin Kroeker
5c11b2ff41
Handle corner cases of LWORK (Reference-LAPACK PR 942)
2023-12-23 19:27:20 +01:00
Martin Kroeker
8ce44c18a0
Handle corner cases of LWORK (Reference-LAPACK PR 942)
2023-12-23 19:24:10 +01:00
Chris Sidebottom
dc20a78188
Use functionally equivalent dynamic targets
...
Similar to `drivers/other/dynamic.c`, I've looked for functionally
equivalent targets and mapped them in the default DYNAMIC_ARCH build.
Users can still build specific cores using DYNAMIC_LIST.
2023-12-23 12:45:27 +00:00
Chris Sidebottom
ecae1389df
Reduce duplication in kernel definitions
...
These files are exactly the same, so I believe we can reduce these files
down. Other files require a slightly more complex unpicking.
2023-12-23 12:39:53 +00:00
Martin Kroeker
68ef2328eb
Merge pull request #4388 from martin-frbg/issue4387
...
Add lower limit for multithreading in the reimplemented LAPACK ?GESV
2023-12-21 22:21:44 +01:00
Martin Kroeker
a7ed60bfe9
Add lower limit for multithreading
2023-12-21 20:05:23 +01:00
Martin Kroeker
67779177b9
Merge pull request #4383 from martin-frbg/fixlapatest
...
Restore OpenBLAS-specific changes to the LAPACK test framework
2023-12-20 14:01:59 +01:00
Martin Kroeker
e67a0eaaf9
Restore OpenBLAS-specific build rule changes
2023-12-19 23:15:11 +01:00
Martin Kroeker
bb8b91e9f2
restore OpenBLAS-specific test paths
2023-12-19 23:13:02 +01:00
Martin Kroeker
fa220b2969
Merge pull request #4382 from Mousius/sve-dot-again
...
Tweak SVE dot kernel
2023-12-19 18:46:18 +01:00
Martin Kroeker
3f46d0c79a
Merge pull request #4381 from darshanp4/issue_4323
...
Update GEMM param for NEOVERSEV1
2023-12-19 16:53:53 +01:00
Chris Sidebottom
60e66725e4
Use numeric labels to allow repeated inlining
2023-12-19 13:11:06 +00:00
Chris Sidebottom
7a4fef4f60
Tweak SVE dot kernel
...
This changes the SVE dot kernel to only predicate when necessary as well
as streamlining the assembly a bit. The benchmarks seem to indicate this
can improve performance by ~33%.
2023-12-19 12:08:54 +00:00
Darshan Patel
dab0da8243
Update GEMM param for NEOVERSEV1
2023-12-19 13:56:55 +05:30
Martin Kroeker
3b520a56a9
Merge pull request #4378 from martin-frbg/issue3871
...
Mention C906V instruction set limitation and update DYNAMIC_ARCH lists
2023-12-15 21:58:56 +01:00
Martin Kroeker
563daadc92
Merge pull request #4379 from barracuda156/ppc970
...
PPC970: drop -mcpu=970 which seems to produce faulty code
2023-12-15 20:03:44 +01:00
barracuda156
8c143331b0
PPC970: drop -mcpu=970 which seems to produce faulty code
...
Fixes: https://github.com/OpenMathLib/OpenBLAS/issues/4376
2023-12-15 22:56:06 +08:00
Martin Kroeker
d2f1594bca
Merge pull request #4368 from martin-frbg/issue4073
...
Add complex type definitions for MSVC in Reference-LAPACK's lapack.h
2023-12-15 14:49:52 +01:00
Martin Kroeker
544cb86300
Mention C906V instruction set limitation and update DYNAMIC_ARCH lists
2023-12-15 14:03:59 +01:00
Martin Kroeker
8793601e86
Merge pull request #4375 from martin-frbg/issue4352
...
Retire the GotoBLAS gemv_t kernel still used as fallback on x86_64
2023-12-15 13:35:18 +01:00
Martin Kroeker
f06b535566
Use C kernel for dgemv_t due to limitations of the old assembly one
2023-12-15 09:58:44 +01:00
Martin Kroeker
293131d6b9
Merge pull request #4370 from barracuda156/unbreak_powerpc
...
macOS PowerPC: fix CMake build
2023-12-14 10:30:03 +01:00
barracuda156
981e315b30
cc.cmake: use -force_cpusubtype_ALL for Darwin PPC
2023-12-14 12:01:31 +08:00
barracuda156
d9653af018
KERNEL.PPC970, KERNEL.PPCG4: unbreak CMake parsing
...
Fixes: https://github.com/OpenMathLib/OpenBLAS/issues/4366
2023-12-14 12:00:11 +08:00
Martin Kroeker
302ca7edc7
Merge pull request #4371 from barracuda156/970
...
cc.cmake: add optflags for G5 and G4 kernels
2023-12-13 14:32:37 +01:00
barracuda156
a8d3619f65
cc.cmake: add optflags for G5 and G4 kernels
2023-12-13 19:42:56 +08:00
Martin Kroeker
aa46f1e4e7
revert addition of MSVC-compatible complex (moved to lapacke_config.h)
2023-12-12 23:07:48 +01:00
Martin Kroeker
dcdc351272
Add MSVC-compatible complex types
2023-12-12 23:06:22 +01:00
Martin Kroeker
55a0718f72
Merge pull request #4369 from ChipKerchner/power10Copies
...
Replace two vector loads with one vector pair load.
2023-12-12 18:49:21 +01:00
Chip-Kerchner
93747fb377
Merge remote-tracking branch 'origin/develop' into power10Copies
2023-12-12 09:32:49 -06:00
Martin Kroeker
dcf6999c4e
remove extraneous endif
2023-12-12 11:27:17 +01:00
Martin Kroeker
330101e0b3
Add complex type definitions for MSVC
2023-12-11 21:52:00 +01:00
Martin Kroeker
d9f1478068
Merge pull request #4367 from barracuda156/unbreak_powerpc
...
Fix arch detection with CMake build for PowerPC
2023-12-11 21:38:32 +01:00
barracuda156
9dbc8129b3
cpuid_power.c: add CPU_SUBTYPE_POWERPC_7400 case
2023-12-11 21:09:06 +08:00
barracuda156
c732f275a2
system_check.cmake: fix arch detection for Darwin PowerPC
2023-12-11 21:05:31 +08:00
Martin Kroeker
e60fb0f397
Merge pull request #4359 from mseminatore/win_perf
...
Improve Windows threading performance scaling
2023-12-09 23:40:26 +01:00
Mark Seminatore
efa9515a23
Merge branch 'OpenMathLib:develop' into win_perf
2023-12-09 10:09:49 -08:00
Chip-Kerchner
4e738e561a
Replace two vector loads with one vector pair load and fix endianess of stores.
2023-12-08 12:36:08 -06:00
Mark Seminatore
edac80d7e8
some cleanup, dynamically scale threads, add missing WIN_CASE defn
2023-12-07 14:59:27 -08:00