Commit Graph

259 Commits

Author SHA1 Message Date
Martin Kroeker 72caceb324
Merge pull request #4009 from Mousius/sve-gemm
Use SVE kernel for SGEMM/DGEMM on Arm(R) Neoverse(TM) V1
2023-04-22 13:56:45 +02:00
Martin Kroeker 437c0bf2b4
Merge pull request #3843 from Mousius/switch-ratio
Propagate SWITCH_RATIO to DYNAMIC_ARCH builds
2023-04-19 11:51:54 +02:00
Chris Sidebottom ec334e69dc Use SVE kernel for SGEMM/DGEMM on Arm(R) Neoverse(TM) V1
This re-spins #3869 with some additional copy unrolling which helps maintain SYRK performance.

After #3868, the SVE kernels represent a pretty good boost.

This re-uses ARMV8SVE as a base and I'm going to incrementally move everything to use ARMV8SVE in additional patches (as well as fix up anything that's not already in ARMV8SVE).
2023-04-17 17:38:42 +01:00
Chris Sidebottom 5b165420b5 SWITCH_RATIO for Arm(R) Neoverse(TM) architecture
This seems like a good balance of values for reasonably sized matrices. With `SWITCH_RATIO=16` the DGEMM scales better to bigger sizes but the better solution would be some kind of
thread throttling so I've gone with `SWITCH_RATIO=8`.
2023-04-17 15:42:55 +01:00
Chris Sidebottom 32f2fafde7 Propagate SWITCH_RATIO to DYNAMIC_ARCH builds
Previously dynamic builds were either using the default SWITCH_RATIO
or one from the higher level architecture; this patch ensures the
dynamic builds can use this parameter as well.
2023-04-17 15:34:12 +01:00
Martin Kroeker 31fd13d048
MIPS: make HAVE_MSA reflect cpu capability and NO_MSA software/env 2023-01-02 22:19:13 +01:00
Chris Sidebottom 2fb096315e Set SWITCH_RATIO for Arm(R) Neoverse(TM) V1 CPUs
From testing this yields better results than the default of `2`.
2022-11-30 09:35:38 +00:00
Honglin Zhu 4989e039a5 Define SBGEMM_ALIGN_K for DYNAMIC_ARCH build 2022-10-27 14:10:26 +08:00
Jiaxun Yang a50b29c540 Provide a fallback MIPS64_GENERIC target
It is really dangerous to fallback to Loongson core on other
MIPS64 processors.

Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
2022-08-12 13:13:28 +01:00
gxw fbfe1daf6e LoongArch64: Add DYNAMIC_ARCH support 2022-07-28 14:28:45 +08:00
gxw 3573306a69 LoongArch64: Add core LOONGSON2K1000 and LOONGSONGENERIC 2022-07-25 16:04:56 +08:00
Honglin Zhu 123e0dfb62 Neoverse N2 sbgemm:
1. Modify the algorithm to resolve multithreading failures
    2. No memory allocation in sbgemm kernel
    3. Optimize when alpha == 1.0f
2022-06-29 10:14:21 +08:00
Honglin Zhu 55d686d41e neoverse n2 sbgemm:
implement ncopy tcopy kernel_8x4
2022-06-29 10:14:21 +08:00
Martin Kroeker dac14a5f7d
revert "switch DGEMM parameters for SkylakeX if DYNAMIC_ARCH" 2022-05-20 11:28:23 +02:00
Martin Kroeker a55a06c269
Update param.h 2022-03-28 18:10:08 +02:00
Martin Kroeker d93cf7f23c
fix defines for CORTEX-X 2022-03-28 17:37:06 +02:00
Martin Kroeker 09b8545fc5
Add initial support for M1 on Linux, Phytium FT2xxx series, ARM Cortex 510/710/X1/X2 2022-03-27 15:24:40 +02:00
Martin Kroeker 8d0f7f0176
Revert accidental change of generic ARMV8 DGEMM parameters from #3425 2022-03-27 13:10:47 +02:00
Martin Kroeker c1c0d5ce1d
Merge pull request #3492 from binebrank/arm_sve_zgemm
SVE zgemm&cgemm (and other BLAS 3 complex)
2022-01-18 21:36:33 +01:00
Bine Brank b6a445cfd8 adapt Makefile for SVE trsm 2022-01-16 21:40:56 +01:00
Martin Kroeker 499ae5e8f7
Merge pull request #3510 from martin-frbg/issue3505
Fix recent SkylakeX/DYNAMIC_ARCH DGEMM breakage
2022-01-09 14:50:51 +01:00
Martin Kroeker b6b024232d
Merge pull request #3508 from snadampal/v1_n2
OpenBLAS: aarch64: Add neoverse-v1/n2 architecture specifics
2022-01-09 14:50:26 +01:00
Martin Kroeker 15d4b37913
SkylakeX: match parameters to dgemm kernels for dyn/non-dyn 2022-01-08 23:48:13 +01:00
Sunita Nadampalli 19c8f615dc OpenBLAS: aarch64: Add neoverse-v1/n2 architecture specifics 2022-01-07 00:28:17 +00:00
Bine Brank 39ab219704 sve copy functions for cgemm chemm zsymm 2022-01-05 09:12:22 +01:00
gxw 8d9b9c6b2a loongarch64: Optimize dgemm_kernel 2021-12-21 09:33:06 +08:00
Martin Kroeker 697e2752d7
Merge pull request #3464 from binebrank/arm_sve_sgemm
Add sgemm part for Arm SVE
2021-12-11 20:35:22 +01:00
Bine Brank a8f62a347b fix UNROLL_MN and add to targets for SVE 2021-12-11 16:37:23 +01:00
Martin Kroeker f7f7fea0dc
Merge pull request #3472 from kavanabhat/p10_aixas_p8
Fallback for Power kernels
2021-12-09 07:28:57 +01:00
kavanabhat eee3381cbe Fallback for Power kernels 2021-12-08 03:52:23 -06:00
Martin Kroeker dd1f645371
switch DGEMM unroll parameters for SkylakeX if DYNAMIC_ARCH 2021-12-06 19:42:51 +01:00
Bine Brank 86ae89bf33 add sgemm kernel and copy functions for sgemm and ssymm 2021-11-28 18:12:47 +01:00
Martin Kroeker 454edd741c
Merge pull request #3425 from binebrank/arm_sve_dgemm
Add dgemm kernel for arm64 SVE
2021-11-26 16:14:55 +01:00
Bine Brank f4da23dcb6 reduced dgemm_unroll_m to work with 128-bit sve 2021-11-23 21:18:08 +01:00
Bine Brank 9388f05a3c configure SVE Makefile 2021-11-21 18:33:43 +01:00
Martin Kroeker 52a3f004a0
Fix unintended reversion of recent CortexA53 changes 2021-11-20 23:54:48 +01:00
Martin Kroeker 19ccef5fb1
Add generic MIPS32 target 2021-11-20 17:31:11 +01:00
Jia-Chen 302f22693a MOD: optimize normal DGEMM on ARMV8 cortex-A53 & cortex-A55 2021-11-18 21:14:43 +08:00
Martin Kroeker 46947efb83
Ignore compiler support for MIPS MSA if the cpu lacks this capability 2021-11-13 23:32:26 +01:00
Bine Brank ab7917910d add v2x8 kernel + fix sve dtrmm 2021-11-07 20:37:51 +01:00
Bine Brank 7093372e32 add ARMV8SVE target 2021-11-01 22:53:21 +01:00
Wangyang Guo 7b2f5cb3b7 sbgemm: spr: enlarge P to 256 for performance 2021-10-17 19:08:03 -07:00
Wangyang Guo 0abbcd19c1 sbgemm: spr: tuning for blocking params 2021-10-17 19:08:03 -07:00
Wangyang Guo 3dc6052c7e initial support for Sapphire Rapids platform 2021-10-12 01:30:40 -07:00
Martin Kroeker 24233b7c49
Use "big arm server" GEMM defaults for Vortex 2021-10-06 11:10:19 +02:00
kavanabhat fe3c778c51 AIX changes for P10 with GNU Compiler 2021-09-30 06:06:27 -05:00
Wangyang Guo 8356a604f0 sbgemm: cooperlake: tuning for block params 2021-09-07 21:30:46 +08:00
Niyas Sait 7cddbf99b1 Make explicit conversion condition on _WIN64 flag 2021-08-31 14:36:44 +01:00
Niyas Sait d1ed72fa87 [win/arm64]: Explicit casting for GMEMM_DEFAULT_ALIGN to create 64-bit value
Win64 uses LLP64 datamodel and unsigned long is only 32-bit. For 64-bit
architecture we need 64-bit mask to correctly generate address
2021-08-31 11:56:10 +01:00
gxw af0a69f355 Add support for LOONGARCH64 2021-07-27 15:29:12 +08:00