Martin Kroeker
889c5d026a
Merge pull request #4456 from kseniyazaytseva/riscv-rvv10
...
Fix BLAS and LAPACK tests for RVV 1.0 target, update to 0.12.0 intrincics
2024-01-26 13:31:09 +01:00
kseniyazaytseva
b193ea3d7b
Fix BLAS and LAPACK tests for RVV 1.0 target, update to 0.12.0 intrincics
...
* Update intrincics API to 0.12.0 version (Stride Segment Loads/Stores)
* Fixed nrm2, axpby, ncopy, zgemv and scal kernels
* Added zero size checks
2024-01-18 22:14:32 +03:00
Octavian Maghiar
e4586e81b8
[RISC-V] Add RISC-V Vector 128-bit target
...
Current RVV x280 target depends on vlen=512-bits for Level 3 operations.
Commit adds generic target that supports vlen=128-bits.
New target uses the same scalable kernels as x280 for Level 1&2 operations, and autogenerated kernels for Level 3 operations.
Functional correctness of Level 3 operations tested on vlen=128-bits using QEMU v8.1.1 for ctests and BLAS-Tester.
2023-12-04 11:02:18 +00:00
Zhang Xianyi
19f17c8bc6
Merge pull request #3893 from HellerZheng/develop
...
add riscv level3 C,Z kernel functions.
2023-03-15 10:17:13 +08:00
Sergei Lewis
2406958629
* update intrinsics to match latest spec at https://github.com/riscv-non-isa/rvv-intrinsic-doc (in particular, __riscv_ prefixes for rvv intrinsics)
...
* fix multiple numerical stability and corner case issues
* add a script to generate arbitrary gemm kernel shapes
* add a generic zvl256b target to demonstrate large gemm kernel unrolls
2023-02-24 10:45:03 +00:00
Heller Zheng
63cf4d0166
add riscv level3 C,Z kernel functions.
2023-02-01 19:13:44 -08:00
Xianyi Zhang
e5313f53d5
Merge branch 'develop' of https://github.com/HellerZheng/OpenBLAS_riscv_x280 into HellerZheng-develop
2022-12-03 12:00:52 +08:00
Chris Sidebottom
2fb096315e
Set SWITCH_RATIO for Arm(R) Neoverse(TM) V1 CPUs
...
From testing this yields better results than the default of `2`.
2022-11-30 09:35:38 +00:00
Heller Zheng
bef47917bd
Initial version for riscv sifive x280
2022-11-15 00:06:25 -08:00
Honglin Zhu
4989e039a5
Define SBGEMM_ALIGN_K for DYNAMIC_ARCH build
2022-10-27 14:10:26 +08:00
Jiaxun Yang
a50b29c540
Provide a fallback MIPS64_GENERIC target
...
It is really dangerous to fallback to Loongson core on other
MIPS64 processors.
Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
2022-08-12 13:13:28 +01:00
gxw
fbfe1daf6e
LoongArch64: Add DYNAMIC_ARCH support
2022-07-28 14:28:45 +08:00
gxw
3573306a69
LoongArch64: Add core LOONGSON2K1000 and LOONGSONGENERIC
2022-07-25 16:04:56 +08:00
Honglin Zhu
123e0dfb62
Neoverse N2 sbgemm:
...
1. Modify the algorithm to resolve multithreading failures
2. No memory allocation in sbgemm kernel
3. Optimize when alpha == 1.0f
2022-06-29 10:14:21 +08:00
Honglin Zhu
55d686d41e
neoverse n2 sbgemm:
...
implement ncopy tcopy kernel_8x4
2022-06-29 10:14:21 +08:00
Martin Kroeker
dac14a5f7d
revert "switch DGEMM parameters for SkylakeX if DYNAMIC_ARCH"
2022-05-20 11:28:23 +02:00
Martin Kroeker
a55a06c269
Update param.h
2022-03-28 18:10:08 +02:00
Martin Kroeker
d93cf7f23c
fix defines for CORTEX-X
2022-03-28 17:37:06 +02:00
Martin Kroeker
09b8545fc5
Add initial support for M1 on Linux, Phytium FT2xxx series, ARM Cortex 510/710/X1/X2
2022-03-27 15:24:40 +02:00
Martin Kroeker
8d0f7f0176
Revert accidental change of generic ARMV8 DGEMM parameters from #3425
2022-03-27 13:10:47 +02:00
Martin Kroeker
c1c0d5ce1d
Merge pull request #3492 from binebrank/arm_sve_zgemm
...
SVE zgemm&cgemm (and other BLAS 3 complex)
2022-01-18 21:36:33 +01:00
Bine Brank
b6a445cfd8
adapt Makefile for SVE trsm
2022-01-16 21:40:56 +01:00
Martin Kroeker
499ae5e8f7
Merge pull request #3510 from martin-frbg/issue3505
...
Fix recent SkylakeX/DYNAMIC_ARCH DGEMM breakage
2022-01-09 14:50:51 +01:00
Martin Kroeker
b6b024232d
Merge pull request #3508 from snadampal/v1_n2
...
OpenBLAS: aarch64: Add neoverse-v1/n2 architecture specifics
2022-01-09 14:50:26 +01:00
Martin Kroeker
15d4b37913
SkylakeX: match parameters to dgemm kernels for dyn/non-dyn
2022-01-08 23:48:13 +01:00
Sunita Nadampalli
19c8f615dc
OpenBLAS: aarch64: Add neoverse-v1/n2 architecture specifics
2022-01-07 00:28:17 +00:00
Bine Brank
39ab219704
sve copy functions for cgemm chemm zsymm
2022-01-05 09:12:22 +01:00
gxw
8d9b9c6b2a
loongarch64: Optimize dgemm_kernel
2021-12-21 09:33:06 +08:00
Martin Kroeker
697e2752d7
Merge pull request #3464 from binebrank/arm_sve_sgemm
...
Add sgemm part for Arm SVE
2021-12-11 20:35:22 +01:00
Bine Brank
a8f62a347b
fix UNROLL_MN and add to targets for SVE
2021-12-11 16:37:23 +01:00
Martin Kroeker
f7f7fea0dc
Merge pull request #3472 from kavanabhat/p10_aixas_p8
...
Fallback for Power kernels
2021-12-09 07:28:57 +01:00
kavanabhat
eee3381cbe
Fallback for Power kernels
2021-12-08 03:52:23 -06:00
Martin Kroeker
dd1f645371
switch DGEMM unroll parameters for SkylakeX if DYNAMIC_ARCH
2021-12-06 19:42:51 +01:00
Bine Brank
86ae89bf33
add sgemm kernel and copy functions for sgemm and ssymm
2021-11-28 18:12:47 +01:00
Martin Kroeker
454edd741c
Merge pull request #3425 from binebrank/arm_sve_dgemm
...
Add dgemm kernel for arm64 SVE
2021-11-26 16:14:55 +01:00
Bine Brank
f4da23dcb6
reduced dgemm_unroll_m to work with 128-bit sve
2021-11-23 21:18:08 +01:00
Bine Brank
9388f05a3c
configure SVE Makefile
2021-11-21 18:33:43 +01:00
Martin Kroeker
52a3f004a0
Fix unintended reversion of recent CortexA53 changes
2021-11-20 23:54:48 +01:00
Martin Kroeker
19ccef5fb1
Add generic MIPS32 target
2021-11-20 17:31:11 +01:00
Jia-Chen
302f22693a
MOD: optimize normal DGEMM on ARMV8 cortex-A53 & cortex-A55
2021-11-18 21:14:43 +08:00
Martin Kroeker
46947efb83
Ignore compiler support for MIPS MSA if the cpu lacks this capability
2021-11-13 23:32:26 +01:00
Bine Brank
ab7917910d
add v2x8 kernel + fix sve dtrmm
2021-11-07 20:37:51 +01:00
Bine Brank
7093372e32
add ARMV8SVE target
2021-11-01 22:53:21 +01:00
Wangyang Guo
7b2f5cb3b7
sbgemm: spr: enlarge P to 256 for performance
2021-10-17 19:08:03 -07:00
Wangyang Guo
0abbcd19c1
sbgemm: spr: tuning for blocking params
2021-10-17 19:08:03 -07:00
Wangyang Guo
3dc6052c7e
initial support for Sapphire Rapids platform
2021-10-12 01:30:40 -07:00
Martin Kroeker
24233b7c49
Use "big arm server" GEMM defaults for Vortex
2021-10-06 11:10:19 +02:00
kavanabhat
fe3c778c51
AIX changes for P10 with GNU Compiler
2021-09-30 06:06:27 -05:00
Wangyang Guo
8356a604f0
sbgemm: cooperlake: tuning for block params
2021-09-07 21:30:46 +08:00
Niyas Sait
7cddbf99b1
Make explicit conversion condition on _WIN64 flag
2021-08-31 14:36:44 +01:00