Commit Graph

298 Commits

Author SHA1 Message Date
Martin Kroeker a47b3c8867
Fix unroll parameter selection for MIPS64_GENERIC 2024-10-13 22:54:34 +02:00
Martin Kroeker 7c4f3638fd
switch PPCG4 SGEMM kernel to 4x4 2024-10-03 22:00:15 +02:00
gxw 48698b2b1d LoongArch64: Rename core
Use microarchitecture name instead of meaningless strings to name the core,
the legacy core is still retained.
1. Rename LOONGSONGENERIC to LA64_GENERIC
2. Rename LOONGSON3R5 to LA464
3. Rename LOONGSON2K1000 to LA264
2024-09-29 09:35:21 +08:00
Chip Kerchner b1737698db Fix DEFAULTS in SBGEMM for POWER10. Also comparisons for SBGEMM unit test can be exactly due to epilison differences. 2024-08-13 07:01:21 -05:00
Piotr Kubaj 4c12090776
Fix build on FreeBSD/powerpc64* 2024-07-10 22:21:48 +00:00
gxw 6017ad7146 loongarch64: Update dgemm_kernel_16x4 to dgemm_kernel_16x6 2024-05-08 10:10:26 +08:00
Usui, Tetsuzo ca673ca774 Add GEMM_PREFERED_SIZE parameter for Neoverse V1 2024-04-12 17:21:14 +09:00
Martin Kroeker 93d975d8fd
Merge pull request #4593 from XiWeiGu/loongarch_add_buffer_offset
loongarch: Optimizing the performance of the GEMM on servers
2024-04-10 14:23:31 +02:00
gxw d8c4ea8793 loongarch: Optimizing the performance of the GEMM on servers 2024-04-09 09:03:34 -04:00
Martin Kroeker ba6d485102
Adjust SWITCH_RATIO for ZEN and apply GEMM_PREFERRED_SIZE 2024-04-04 18:52:38 +02:00
Martin Kroeker 584e87661d
set SWITCH_RATIO for Cortex-A76 2024-04-02 23:10:45 +02:00
Martin Kroeker b925f61fb0
Add support for Cortex-A76 2024-04-02 19:44:17 +02:00
Rajalakshmi Srinivasaraghavan f5b2a877e2 POWER9: Use default param values from POWER8 on AIX
AIX uses KERNEL.POWER8 optimization on POWER9 and changing
the default GEMM parameters in param.h to use POWER8 values
on POWER9.
2024-03-20 10:17:49 -05:00
pengxu 4787a55c64 Optimized cgemm kernel 16x4 LASX for LoongArch 2024-02-21 15:28:47 +08:00
pengxu fe3da43b7d Optimized zgemm kernel 8*4 LASX, 4*4 LSX and cgemm kernel 8*4 LSX for LoongArch 2024-02-06 11:49:01 +08:00
Martin Kroeker e5d2725e5a
Merge pull request #4185 from XiWeiGu/mips_enable_msa
MIPS: Enable MSA
2024-02-05 15:50:16 +01:00
Sergei Lewis 1093def0d1 Merge branch 'risc-v' into develop 2024-01-29 11:11:39 +00:00
Martin Kroeker 889c5d026a
Merge pull request #4456 from kseniyazaytseva/riscv-rvv10
Fix BLAS and LAPACK tests for RVV 1.0 target, update to 0.12.0 intrincics
2024-01-26 13:31:09 +01:00
kseniyazaytseva b193ea3d7b Fix BLAS and LAPACK tests for RVV 1.0 target, update to 0.12.0 intrincics
* Update intrincics API to 0.12.0 version (Stride Segment Loads/Stores)
* Fixed nrm2, axpby, ncopy, zgemv and scal kernels
* Added zero size checks
2024-01-18 22:14:32 +03:00
Dirreke ec89466e14 Add CSKY support 2024-01-16 23:45:06 +08:00
Martin Kroeker 504f9b0c5e
Increase S/D GEMM PQ to match typical L2 size as forNeoverseV1 2024-01-02 18:46:21 +01:00
Martin Kroeker 2802478449
revert change to Loongson2k1000 zgemm 2023-12-30 23:35:51 +01:00
Martin Kroeker 44b5b9e39f
Update C/ZGEMM MN for Loongson2k1000 2023-12-30 22:50:40 +01:00
Martin Kroeker 519b40fad9
Merge pull request #4398 from yinshiyou/la-dev
Add Optimizations for LoongArch.
2023-12-30 19:51:08 +01:00
pengxu a5d0d21378 loongarch64: Add zgemm and cgemm optimization 2023-12-29 18:06:26 +08:00
Hao Chen 179ed51d3b Add dgemm_kernel_8x4.S file. 2023-12-29 17:30:57 +08:00
Darshan Patel dab0da8243 Update GEMM param for NEOVERSEV1 2023-12-19 13:56:55 +05:30
Octavian Maghiar e4586e81b8 [RISC-V] Add RISC-V Vector 128-bit target
Current RVV x280 target depends on vlen=512-bits for Level 3 operations.
Commit adds generic target that supports vlen=128-bits.
New target uses the same scalable kernels as x280 for Level 1&2 operations, and autogenerated kernels for Level 3 operations.
Functional correctness of Level 3 operations tested on vlen=128-bits using QEMU v8.1.1 for ctests and BLAS-Tester.
2023-12-04 11:02:18 +00:00
Rajalakshmi Srinivasaraghavan 980f702f72 POWER: AIX: Make use of power10 optimization
POWER10 optimizations are disabled when using default AIX assembler.
As we have fixed many issues recently, enabling optimization path
for default assembler.
2023-10-19 18:48:19 -05:00
gxw 553cc1372f LoongArch64: Add sgemm_kernel 2023-08-23 16:08:43 +08:00
gxw 4d0f000db6 MIPS: Enable MSA 2023-08-07 21:00:10 +08:00
gxw d46772e037 LoongArch64: Add compiler feature checks 2023-08-05 10:21:43 +08:00
Chris Sidebottom 84a268b6ca Use SVE zgemm/cgemm on Arm(R) Neoverse(TM) V1 core
This patch removes the prefetches from cgemm/zgemm which improves the performance similar to sgemm/dgemm did in #3868, this means I'm happy to enable this on any applicable cores.

I also replicated the unrolling the copies from sgemm and dgemm.
2023-07-27 14:12:20 +01:00
Chris Sidebottom f971ef55f2 Add ARMV8SVE to AArch64 Dynamic Dispatch
In order to enable support for future cores which have similar tunings
(in this case I'm doing this for the Arm(R) Neoverse(TM) V2 core), this generically detects SVE support and enables it. This should better manage the size and complexity of dynamic dispatch rather than just copy pasting the same parameters.

To make `ARMV8SVE` more representive of the common 128-bit SVE case,
I've split it and similar parameters from A64FX which has the wider
512-bit SVE.
2023-07-25 18:35:15 +01:00
Martin Kroeker 72caceb324
Merge pull request #4009 from Mousius/sve-gemm
Use SVE kernel for SGEMM/DGEMM on Arm(R) Neoverse(TM) V1
2023-04-22 13:56:45 +02:00
Martin Kroeker 437c0bf2b4
Merge pull request #3843 from Mousius/switch-ratio
Propagate SWITCH_RATIO to DYNAMIC_ARCH builds
2023-04-19 11:51:54 +02:00
Chris Sidebottom ec334e69dc Use SVE kernel for SGEMM/DGEMM on Arm(R) Neoverse(TM) V1
This re-spins #3869 with some additional copy unrolling which helps maintain SYRK performance.

After #3868, the SVE kernels represent a pretty good boost.

This re-uses ARMV8SVE as a base and I'm going to incrementally move everything to use ARMV8SVE in additional patches (as well as fix up anything that's not already in ARMV8SVE).
2023-04-17 17:38:42 +01:00
Chris Sidebottom 5b165420b5 SWITCH_RATIO for Arm(R) Neoverse(TM) architecture
This seems like a good balance of values for reasonably sized matrices. With `SWITCH_RATIO=16` the DGEMM scales better to bigger sizes but the better solution would be some kind of
thread throttling so I've gone with `SWITCH_RATIO=8`.
2023-04-17 15:42:55 +01:00
Chris Sidebottom 32f2fafde7 Propagate SWITCH_RATIO to DYNAMIC_ARCH builds
Previously dynamic builds were either using the default SWITCH_RATIO
or one from the higher level architecture; this patch ensures the
dynamic builds can use this parameter as well.
2023-04-17 15:34:12 +01:00
Zhang Xianyi 19f17c8bc6
Merge pull request #3893 from HellerZheng/develop
add riscv level3 C,Z kernel functions.
2023-03-15 10:17:13 +08:00
Sergei Lewis 2406958629 * update intrinsics to match latest spec at https://github.com/riscv-non-isa/rvv-intrinsic-doc (in particular, __riscv_ prefixes for rvv intrinsics)
* fix multiple numerical stability and corner case issues
* add a script to generate arbitrary gemm kernel shapes
* add a generic zvl256b target to demonstrate large gemm kernel unrolls
2023-02-24 10:45:03 +00:00
Heller Zheng 63cf4d0166 add riscv level3 C,Z kernel functions. 2023-02-01 19:13:44 -08:00
Martin Kroeker 31fd13d048
MIPS: make HAVE_MSA reflect cpu capability and NO_MSA software/env 2023-01-02 22:19:13 +01:00
Xianyi Zhang e5313f53d5 Merge branch 'develop' of https://github.com/HellerZheng/OpenBLAS_riscv_x280 into HellerZheng-develop 2022-12-03 12:00:52 +08:00
Chris Sidebottom 2fb096315e Set SWITCH_RATIO for Arm(R) Neoverse(TM) V1 CPUs
From testing this yields better results than the default of `2`.
2022-11-30 09:35:38 +00:00
Heller Zheng bef47917bd Initial version for riscv sifive x280 2022-11-15 00:06:25 -08:00
Honglin Zhu 4989e039a5 Define SBGEMM_ALIGN_K for DYNAMIC_ARCH build 2022-10-27 14:10:26 +08:00
Jiaxun Yang a50b29c540 Provide a fallback MIPS64_GENERIC target
It is really dangerous to fallback to Loongson core on other
MIPS64 processors.

Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
2022-08-12 13:13:28 +01:00
gxw fbfe1daf6e LoongArch64: Add DYNAMIC_ARCH support 2022-07-28 14:28:45 +08:00
gxw 3573306a69 LoongArch64: Add core LOONGSON2K1000 and LOONGSONGENERIC 2022-07-25 16:04:56 +08:00