Commit Graph

8616 Commits

Author SHA1 Message Date
Hao Chen 8785e948b5 loongarch64: Add camin optimization function. 2023-12-29 17:30:57 +08:00
Hao Chen 0753848e03 loongarch64: Refine and add axpy optimization functions.
Signed-off-by: Hao Chen <chenhao@loongson.cn>
2023-12-29 17:30:57 +08:00
Hao Chen 06fd5b5995 loongarch64: Add and Refine asum optimization functions. 2023-12-29 17:30:57 +08:00
guxiwei e771be185e Optimize copy functions with lsx.
Signed-off-by: Hao Chen <chenhao@loongson.cn>
2023-12-29 17:30:57 +08:00
Hao Chen 179ed51d3b Add dgemm_kernel_8x4.S file. 2023-12-29 17:30:57 +08:00
Hao Chen 173a65d4e6 loongarch64: Add and refine iamax optimization functions. 2023-12-29 17:30:57 +08:00
zhoupeng ea70e165c7 loongarch64: Refine rot optimization. 2023-12-29 17:30:57 +08:00
zhoupeng 116aee7527 loongarch64: Refine imin optimization. 2023-12-29 17:30:57 +08:00
zhoupeng 8be2654193 loongarch64: Refine imax optimization. 2023-12-29 17:30:57 +08:00
zhoupeng 154baad454 loongarch64: Refine iamin optimization. 2023-12-29 17:30:57 +08:00
Shiyou Yin 36c12c4971 loongarch64: Refine copy,swap,nrm2,sum optimization. 2023-12-29 17:30:57 +08:00
Shiyou Yin c6996a80e9 loongarch64: Refine amax,amin,max,min optimization. 2023-12-29 17:30:57 +08:00
Martin Kroeker 21564bde2c
Merge pull request #4394 from martin-frbg/dyn_vortex
Add Apple M as NeoverseN1 in ARM64 DYNAMIC_ARCH runtime detection
2023-12-28 13:35:55 +01:00
Martin Kroeker e9c32ed165
Merge pull request #4384 from yetist/develop
Fix: build failed on LoongArch
2023-12-27 14:05:01 +01:00
Martin Kroeker e7a895e714
Add Apple M as NeoverseN1 2023-12-25 12:36:05 +01:00
Martin Kroeker 474ce0ace9
Merge pull request #4393 from martin-frbg/pr4389-2
Remove redundant targets from the default ARM64 DYNAMIC_ARCH list
2023-12-25 12:30:56 +01:00
Martin Kroeker 1106460bb3
remove redundant targets from the default ARM64 DYNAMIC_ARCH list 2023-12-25 12:29:56 +01:00
Martin Kroeker 236acee706
Merge pull request #4389 from Mousius/reduce-dynamic-targets
Use functionally equivalent dynamic targets
2023-12-25 12:27:42 +01:00
Xiaotian Wu d2f4f1b28a CI: update toolchains for LoongArch64 2023-12-25 16:04:43 +08:00
Wu Xiaotian 0baf462dbc Fix: build failed on LoongArch
According to the documentation at https://github.com/loongson/la-abi-specs/blob/release/lapcs.adoc#the-base-abi-variants, valid -mabi parameters are lp64s, lp64f, lp64d, ilp32s, ilp32f and ilp32d.
2023-12-25 16:04:43 +08:00
Martin Kroeker 63a83939a1
Merge pull request #4390 from Mousius/reduce-kernel-duplication
Reduce duplication in kernel definitions
2023-12-24 18:04:26 +01:00
Martin Kroeker dba404055d
Merge pull request #4392 from martin-frbg/lapack959
Fix issues related to the ?GEDMD functions (Reference-LAPACK PR 959)
2023-12-24 10:44:15 +01:00
Martin Kroeker c6fa921027
Add tests for ?GEDMD (Reference-LAPACK PR 959) 2023-12-23 23:39:53 +01:00
Martin Kroeker 283713e4c5
Add tests for ?GEDMD (Reference-LAPACK PR 959) 2023-12-23 23:32:45 +01:00
Martin Kroeker 201f22f49a
Fix issues related to ?GEDMD (Reference-LAPACK PR 959) 2023-12-23 23:27:38 +01:00
Martin Kroeker 05dde8ef04
Merge pull request #4391 from martin-frbg/lapack942
Handle corner cases of LWORK (Reference-LAPACK PR 942)
2023-12-23 23:11:46 +01:00
Martin Kroeker 45ef0d7361
Handle corner cases of LWORK (Reference-LAPACK PR 942) 2023-12-23 20:16:33 +01:00
Martin Kroeker c082669ad4
Handle corner cases of LWORK (Reference-LAPACK PR 942) 2023-12-23 20:05:03 +01:00
Martin Kroeker 29d6024ec5
Handle corner cases of LWORK (Reference-LAPACK PR 942) 2023-12-23 19:44:11 +01:00
Martin Kroeker 0814491d96
Handle corner cases of LWORK (Reference-LAPACK PR 942) 2023-12-23 19:37:03 +01:00
Martin Kroeker 5c11b2ff41
Handle corner cases of LWORK (Reference-LAPACK PR 942) 2023-12-23 19:27:20 +01:00
Martin Kroeker 8ce44c18a0
Handle corner cases of LWORK (Reference-LAPACK PR 942) 2023-12-23 19:24:10 +01:00
Chris Sidebottom dc20a78188 Use functionally equivalent dynamic targets
Similar to `drivers/other/dynamic.c`, I've looked for functionally
equivalent targets and mapped them in the default DYNAMIC_ARCH build.
Users can still build specific cores using DYNAMIC_LIST.
2023-12-23 12:45:27 +00:00
Chris Sidebottom ecae1389df Reduce duplication in kernel definitions
These files are exactly the same, so I believe we can reduce these files
down. Other files require a slightly more complex unpicking.
2023-12-23 12:39:53 +00:00
Martin Kroeker 68ef2328eb
Merge pull request #4388 from martin-frbg/issue4387
Add lower limit for multithreading in the reimplemented LAPACK ?GESV
2023-12-21 22:21:44 +01:00
Martin Kroeker a7ed60bfe9
Add lower limit for multithreading 2023-12-21 20:05:23 +01:00
Martin Kroeker 67779177b9
Merge pull request #4383 from martin-frbg/fixlapatest
Restore OpenBLAS-specific changes to the LAPACK test framework
2023-12-20 14:01:59 +01:00
Martin Kroeker e67a0eaaf9
Restore OpenBLAS-specific build rule changes 2023-12-19 23:15:11 +01:00
Martin Kroeker bb8b91e9f2
restore OpenBLAS-specific test paths 2023-12-19 23:13:02 +01:00
Martin Kroeker fa220b2969
Merge pull request #4382 from Mousius/sve-dot-again
Tweak SVE dot kernel
2023-12-19 18:46:18 +01:00
Martin Kroeker 3f46d0c79a
Merge pull request #4381 from darshanp4/issue_4323
Update GEMM param for NEOVERSEV1
2023-12-19 16:53:53 +01:00
Chris Sidebottom 60e66725e4 Use numeric labels to allow repeated inlining 2023-12-19 13:11:06 +00:00
Chris Sidebottom 7a4fef4f60 Tweak SVE dot kernel
This changes the SVE dot kernel to only predicate when necessary as well
as streamlining the assembly a bit. The benchmarks seem to indicate this
can improve performance by ~33%.
2023-12-19 12:08:54 +00:00
Darshan Patel dab0da8243 Update GEMM param for NEOVERSEV1 2023-12-19 13:56:55 +05:30
Martin Kroeker 3b520a56a9
Merge pull request #4378 from martin-frbg/issue3871
Mention C906V instruction set limitation and update DYNAMIC_ARCH lists
2023-12-15 21:58:56 +01:00
Martin Kroeker 563daadc92
Merge pull request #4379 from barracuda156/ppc970
PPC970: drop -mcpu=970 which seems to produce faulty code
2023-12-15 20:03:44 +01:00
barracuda156 8c143331b0 PPC970: drop -mcpu=970 which seems to produce faulty code
Fixes: https://github.com/OpenMathLib/OpenBLAS/issues/4376
2023-12-15 22:56:06 +08:00
Martin Kroeker d2f1594bca
Merge pull request #4368 from martin-frbg/issue4073
Add complex type definitions for MSVC in Reference-LAPACK's lapack.h
2023-12-15 14:49:52 +01:00
Martin Kroeker 544cb86300
Mention C906V instruction set limitation and update DYNAMIC_ARCH lists 2023-12-15 14:03:59 +01:00
Martin Kroeker 8793601e86
Merge pull request #4375 from martin-frbg/issue4352
Retire the GotoBLAS gemv_t kernel still used as fallback on x86_64
2023-12-15 13:35:18 +01:00