gxw
553cc1372f
LoongArch64: Add sgemm_kernel
2023-08-23 16:08:43 +08:00
gxw
d46772e037
LoongArch64: Add compiler feature checks
2023-08-05 10:21:43 +08:00
Chris Sidebottom
84a268b6ca
Use SVE zgemm/cgemm on Arm(R) Neoverse(TM) V1 core
...
This patch removes the prefetches from cgemm/zgemm which improves the performance similar to sgemm/dgemm did in #3868 , this means I'm happy to enable this on any applicable cores.
I also replicated the unrolling the copies from sgemm and dgemm.
2023-07-27 14:12:20 +01:00
Chris Sidebottom
f971ef55f2
Add ARMV8SVE to AArch64 Dynamic Dispatch
...
In order to enable support for future cores which have similar tunings
(in this case I'm doing this for the Arm(R) Neoverse(TM) V2 core), this generically detects SVE support and enables it. This should better manage the size and complexity of dynamic dispatch rather than just copy pasting the same parameters.
To make `ARMV8SVE` more representive of the common 128-bit SVE case,
I've split it and similar parameters from A64FX which has the wider
512-bit SVE.
2023-07-25 18:35:15 +01:00
Martin Kroeker
72caceb324
Merge pull request #4009 from Mousius/sve-gemm
...
Use SVE kernel for SGEMM/DGEMM on Arm(R) Neoverse(TM) V1
2023-04-22 13:56:45 +02:00
Martin Kroeker
437c0bf2b4
Merge pull request #3843 from Mousius/switch-ratio
...
Propagate SWITCH_RATIO to DYNAMIC_ARCH builds
2023-04-19 11:51:54 +02:00
Chris Sidebottom
ec334e69dc
Use SVE kernel for SGEMM/DGEMM on Arm(R) Neoverse(TM) V1
...
This re-spins #3869 with some additional copy unrolling which helps maintain SYRK performance.
After #3868 , the SVE kernels represent a pretty good boost.
This re-uses ARMV8SVE as a base and I'm going to incrementally move everything to use ARMV8SVE in additional patches (as well as fix up anything that's not already in ARMV8SVE).
2023-04-17 17:38:42 +01:00
Chris Sidebottom
5b165420b5
SWITCH_RATIO for Arm(R) Neoverse(TM) architecture
...
This seems like a good balance of values for reasonably sized matrices. With `SWITCH_RATIO=16` the DGEMM scales better to bigger sizes but the better solution would be some kind of
thread throttling so I've gone with `SWITCH_RATIO=8`.
2023-04-17 15:42:55 +01:00
Chris Sidebottom
32f2fafde7
Propagate SWITCH_RATIO to DYNAMIC_ARCH builds
...
Previously dynamic builds were either using the default SWITCH_RATIO
or one from the higher level architecture; this patch ensures the
dynamic builds can use this parameter as well.
2023-04-17 15:34:12 +01:00
Martin Kroeker
31fd13d048
MIPS: make HAVE_MSA reflect cpu capability and NO_MSA software/env
2023-01-02 22:19:13 +01:00
Chris Sidebottom
2fb096315e
Set SWITCH_RATIO for Arm(R) Neoverse(TM) V1 CPUs
...
From testing this yields better results than the default of `2`.
2022-11-30 09:35:38 +00:00
Honglin Zhu
4989e039a5
Define SBGEMM_ALIGN_K for DYNAMIC_ARCH build
2022-10-27 14:10:26 +08:00
Jiaxun Yang
a50b29c540
Provide a fallback MIPS64_GENERIC target
...
It is really dangerous to fallback to Loongson core on other
MIPS64 processors.
Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
2022-08-12 13:13:28 +01:00
gxw
fbfe1daf6e
LoongArch64: Add DYNAMIC_ARCH support
2022-07-28 14:28:45 +08:00
gxw
3573306a69
LoongArch64: Add core LOONGSON2K1000 and LOONGSONGENERIC
2022-07-25 16:04:56 +08:00
Honglin Zhu
123e0dfb62
Neoverse N2 sbgemm:
...
1. Modify the algorithm to resolve multithreading failures
2. No memory allocation in sbgemm kernel
3. Optimize when alpha == 1.0f
2022-06-29 10:14:21 +08:00
Honglin Zhu
55d686d41e
neoverse n2 sbgemm:
...
implement ncopy tcopy kernel_8x4
2022-06-29 10:14:21 +08:00
Martin Kroeker
dac14a5f7d
revert "switch DGEMM parameters for SkylakeX if DYNAMIC_ARCH"
2022-05-20 11:28:23 +02:00
Martin Kroeker
a55a06c269
Update param.h
2022-03-28 18:10:08 +02:00
Martin Kroeker
d93cf7f23c
fix defines for CORTEX-X
2022-03-28 17:37:06 +02:00
Martin Kroeker
09b8545fc5
Add initial support for M1 on Linux, Phytium FT2xxx series, ARM Cortex 510/710/X1/X2
2022-03-27 15:24:40 +02:00
Martin Kroeker
8d0f7f0176
Revert accidental change of generic ARMV8 DGEMM parameters from #3425
2022-03-27 13:10:47 +02:00
Martin Kroeker
c1c0d5ce1d
Merge pull request #3492 from binebrank/arm_sve_zgemm
...
SVE zgemm&cgemm (and other BLAS 3 complex)
2022-01-18 21:36:33 +01:00
Bine Brank
b6a445cfd8
adapt Makefile for SVE trsm
2022-01-16 21:40:56 +01:00
Martin Kroeker
499ae5e8f7
Merge pull request #3510 from martin-frbg/issue3505
...
Fix recent SkylakeX/DYNAMIC_ARCH DGEMM breakage
2022-01-09 14:50:51 +01:00
Martin Kroeker
b6b024232d
Merge pull request #3508 from snadampal/v1_n2
...
OpenBLAS: aarch64: Add neoverse-v1/n2 architecture specifics
2022-01-09 14:50:26 +01:00
Martin Kroeker
15d4b37913
SkylakeX: match parameters to dgemm kernels for dyn/non-dyn
2022-01-08 23:48:13 +01:00
Sunita Nadampalli
19c8f615dc
OpenBLAS: aarch64: Add neoverse-v1/n2 architecture specifics
2022-01-07 00:28:17 +00:00
Bine Brank
39ab219704
sve copy functions for cgemm chemm zsymm
2022-01-05 09:12:22 +01:00
gxw
8d9b9c6b2a
loongarch64: Optimize dgemm_kernel
2021-12-21 09:33:06 +08:00
Martin Kroeker
697e2752d7
Merge pull request #3464 from binebrank/arm_sve_sgemm
...
Add sgemm part for Arm SVE
2021-12-11 20:35:22 +01:00
Bine Brank
a8f62a347b
fix UNROLL_MN and add to targets for SVE
2021-12-11 16:37:23 +01:00
Martin Kroeker
f7f7fea0dc
Merge pull request #3472 from kavanabhat/p10_aixas_p8
...
Fallback for Power kernels
2021-12-09 07:28:57 +01:00
kavanabhat
eee3381cbe
Fallback for Power kernels
2021-12-08 03:52:23 -06:00
Martin Kroeker
dd1f645371
switch DGEMM unroll parameters for SkylakeX if DYNAMIC_ARCH
2021-12-06 19:42:51 +01:00
Bine Brank
86ae89bf33
add sgemm kernel and copy functions for sgemm and ssymm
2021-11-28 18:12:47 +01:00
Martin Kroeker
454edd741c
Merge pull request #3425 from binebrank/arm_sve_dgemm
...
Add dgemm kernel for arm64 SVE
2021-11-26 16:14:55 +01:00
Bine Brank
f4da23dcb6
reduced dgemm_unroll_m to work with 128-bit sve
2021-11-23 21:18:08 +01:00
Bine Brank
9388f05a3c
configure SVE Makefile
2021-11-21 18:33:43 +01:00
Martin Kroeker
52a3f004a0
Fix unintended reversion of recent CortexA53 changes
2021-11-20 23:54:48 +01:00
Martin Kroeker
19ccef5fb1
Add generic MIPS32 target
2021-11-20 17:31:11 +01:00
Jia-Chen
302f22693a
MOD: optimize normal DGEMM on ARMV8 cortex-A53 & cortex-A55
2021-11-18 21:14:43 +08:00
Martin Kroeker
46947efb83
Ignore compiler support for MIPS MSA if the cpu lacks this capability
2021-11-13 23:32:26 +01:00
Bine Brank
ab7917910d
add v2x8 kernel + fix sve dtrmm
2021-11-07 20:37:51 +01:00
Bine Brank
7093372e32
add ARMV8SVE target
2021-11-01 22:53:21 +01:00
Wangyang Guo
7b2f5cb3b7
sbgemm: spr: enlarge P to 256 for performance
2021-10-17 19:08:03 -07:00
Wangyang Guo
0abbcd19c1
sbgemm: spr: tuning for blocking params
2021-10-17 19:08:03 -07:00
Wangyang Guo
3dc6052c7e
initial support for Sapphire Rapids platform
2021-10-12 01:30:40 -07:00
Martin Kroeker
24233b7c49
Use "big arm server" GEMM defaults for Vortex
2021-10-06 11:10:19 +02:00
kavanabhat
fe3c778c51
AIX changes for P10 with GNU Compiler
2021-09-30 06:06:27 -05:00