OpenBLAS

Commit Graph

Author	SHA1	Message	Date
Martin Kroeker	a47b3c8867	Fix unroll parameter selection for MIPS64_GENERIC	2024-10-13 22:54:34 +02:00
Martin Kroeker	7c4f3638fd	switch PPCG4 SGEMM kernel to 4x4	2024-10-03 22:00:15 +02:00
gxw	48698b2b1d	LoongArch64: Rename core Use microarchitecture name instead of meaningless strings to name the core, the legacy core is still retained. 1. Rename LOONGSONGENERIC to LA64_GENERIC 2. Rename LOONGSON3R5 to LA464 3. Rename LOONGSON2K1000 to LA264	2024-09-29 09:35:21 +08:00
Chip Kerchner	b1737698db	Fix DEFAULTS in SBGEMM for POWER10. Also comparisons for SBGEMM unit test can be exactly due to epilison differences.	2024-08-13 07:01:21 -05:00
Piotr Kubaj	4c12090776	Fix build on FreeBSD/powerpc64*	2024-07-10 22:21:48 +00:00
gxw	6017ad7146	loongarch64: Update dgemm_kernel_16x4 to dgemm_kernel_16x6	2024-05-08 10:10:26 +08:00
Usui, Tetsuzo	ca673ca774	Add GEMM_PREFERED_SIZE parameter for Neoverse V1	2024-04-12 17:21:14 +09:00
Martin Kroeker	93d975d8fd	Merge pull request #4593 from XiWeiGu/loongarch_add_buffer_offset loongarch: Optimizing the performance of the GEMM on servers	2024-04-10 14:23:31 +02:00
gxw	d8c4ea8793	loongarch: Optimizing the performance of the GEMM on servers	2024-04-09 09:03:34 -04:00
Martin Kroeker	ba6d485102	Adjust SWITCH_RATIO for ZEN and apply GEMM_PREFERRED_SIZE	2024-04-04 18:52:38 +02:00
Martin Kroeker	584e87661d	set SWITCH_RATIO for Cortex-A76	2024-04-02 23:10:45 +02:00
Martin Kroeker	b925f61fb0	Add support for Cortex-A76	2024-04-02 19:44:17 +02:00
Rajalakshmi Srinivasaraghavan	f5b2a877e2	POWER9: Use default param values from POWER8 on AIX AIX uses KERNEL.POWER8 optimization on POWER9 and changing the default GEMM parameters in param.h to use POWER8 values on POWER9.	2024-03-20 10:17:49 -05:00
pengxu	4787a55c64	Optimized cgemm kernel 16x4 LASX for LoongArch	2024-02-21 15:28:47 +08:00
pengxu	fe3da43b7d	Optimized zgemm kernel 84 LASX, 44 LSX and cgemm kernel 8*4 LSX for LoongArch	2024-02-06 11:49:01 +08:00
Martin Kroeker	e5d2725e5a	Merge pull request #4185 from XiWeiGu/mips_enable_msa MIPS: Enable MSA	2024-02-05 15:50:16 +01:00
Sergei Lewis	1093def0d1	Merge branch 'risc-v' into develop	2024-01-29 11:11:39 +00:00
Martin Kroeker	889c5d026a	Merge pull request #4456 from kseniyazaytseva/riscv-rvv10 Fix BLAS and LAPACK tests for RVV 1.0 target, update to 0.12.0 intrincics	2024-01-26 13:31:09 +01:00
kseniyazaytseva	b193ea3d7b	Fix BLAS and LAPACK tests for RVV 1.0 target, update to 0.12.0 intrincics * Update intrincics API to 0.12.0 version (Stride Segment Loads/Stores) * Fixed nrm2, axpby, ncopy, zgemv and scal kernels * Added zero size checks	2024-01-18 22:14:32 +03:00
Dirreke	ec89466e14	Add CSKY support	2024-01-16 23:45:06 +08:00
Martin Kroeker	504f9b0c5e	Increase S/D GEMM PQ to match typical L2 size as forNeoverseV1	2024-01-02 18:46:21 +01:00
Martin Kroeker	2802478449	revert change to Loongson2k1000 zgemm	2023-12-30 23:35:51 +01:00
Martin Kroeker	44b5b9e39f	Update C/ZGEMM MN for Loongson2k1000	2023-12-30 22:50:40 +01:00
Martin Kroeker	519b40fad9	Merge pull request #4398 from yinshiyou/la-dev Add Optimizations for LoongArch.	2023-12-30 19:51:08 +01:00
pengxu	a5d0d21378	loongarch64: Add zgemm and cgemm optimization	2023-12-29 18:06:26 +08:00
Hao Chen	179ed51d3b	Add dgemm_kernel_8x4.S file.	2023-12-29 17:30:57 +08:00
Darshan Patel	dab0da8243	Update GEMM param for NEOVERSEV1	2023-12-19 13:56:55 +05:30
Octavian Maghiar	e4586e81b8	[RISC-V] Add RISC-V Vector 128-bit target Current RVV x280 target depends on vlen=512-bits for Level 3 operations. Commit adds generic target that supports vlen=128-bits. New target uses the same scalable kernels as x280 for Level 1&2 operations, and autogenerated kernels for Level 3 operations. Functional correctness of Level 3 operations tested on vlen=128-bits using QEMU v8.1.1 for ctests and BLAS-Tester.	2023-12-04 11:02:18 +00:00
Rajalakshmi Srinivasaraghavan	980f702f72	POWER: AIX: Make use of power10 optimization POWER10 optimizations are disabled when using default AIX assembler. As we have fixed many issues recently, enabling optimization path for default assembler.	2023-10-19 18:48:19 -05:00
gxw	553cc1372f	LoongArch64: Add sgemm_kernel	2023-08-23 16:08:43 +08:00
gxw	4d0f000db6	MIPS: Enable MSA	2023-08-07 21:00:10 +08:00
gxw	d46772e037	LoongArch64: Add compiler feature checks	2023-08-05 10:21:43 +08:00
Chris Sidebottom	84a268b6ca	Use SVE zgemm/cgemm on Arm(R) Neoverse(TM) V1 core This patch removes the prefetches from cgemm/zgemm which improves the performance similar to sgemm/dgemm did in #3868, this means I'm happy to enable this on any applicable cores. I also replicated the unrolling the copies from sgemm and dgemm.	2023-07-27 14:12:20 +01:00
Chris Sidebottom	f971ef55f2	Add ARMV8SVE to AArch64 Dynamic Dispatch In order to enable support for future cores which have similar tunings (in this case I'm doing this for the Arm(R) Neoverse(TM) V2 core), this generically detects SVE support and enables it. This should better manage the size and complexity of dynamic dispatch rather than just copy pasting the same parameters. To make `ARMV8SVE` more representive of the common 128-bit SVE case, I've split it and similar parameters from A64FX which has the wider 512-bit SVE.	2023-07-25 18:35:15 +01:00
Martin Kroeker	72caceb324	Merge pull request #4009 from Mousius/sve-gemm Use SVE kernel for SGEMM/DGEMM on Arm(R) Neoverse(TM) V1	2023-04-22 13:56:45 +02:00
Martin Kroeker	437c0bf2b4	Merge pull request #3843 from Mousius/switch-ratio Propagate SWITCH_RATIO to DYNAMIC_ARCH builds	2023-04-19 11:51:54 +02:00
Chris Sidebottom	ec334e69dc	Use SVE kernel for SGEMM/DGEMM on Arm(R) Neoverse(TM) V1 This re-spins #3869 with some additional copy unrolling which helps maintain SYRK performance. After #3868, the SVE kernels represent a pretty good boost. This re-uses ARMV8SVE as a base and I'm going to incrementally move everything to use ARMV8SVE in additional patches (as well as fix up anything that's not already in ARMV8SVE).	2023-04-17 17:38:42 +01:00
Chris Sidebottom	5b165420b5	SWITCH_RATIO for Arm(R) Neoverse(TM) architecture This seems like a good balance of values for reasonably sized matrices. With `SWITCH_RATIO=16` the DGEMM scales better to bigger sizes but the better solution would be some kind of thread throttling so I've gone with `SWITCH_RATIO=8`.	2023-04-17 15:42:55 +01:00
Chris Sidebottom	32f2fafde7	Propagate SWITCH_RATIO to DYNAMIC_ARCH builds Previously dynamic builds were either using the default SWITCH_RATIO or one from the higher level architecture; this patch ensures the dynamic builds can use this parameter as well.	2023-04-17 15:34:12 +01:00
Zhang Xianyi	19f17c8bc6	Merge pull request #3893 from HellerZheng/develop add riscv level3 C,Z kernel functions.	2023-03-15 10:17:13 +08:00
Sergei Lewis	2406958629	* update intrinsics to match latest spec at https://github.com/riscv-non-isa/rvv-intrinsic-doc (in particular, __riscv_ prefixes for rvv intrinsics) * fix multiple numerical stability and corner case issues * add a script to generate arbitrary gemm kernel shapes * add a generic zvl256b target to demonstrate large gemm kernel unrolls	2023-02-24 10:45:03 +00:00
Heller Zheng	63cf4d0166	add riscv level3 C,Z kernel functions.	2023-02-01 19:13:44 -08:00
Martin Kroeker	31fd13d048	MIPS: make HAVE_MSA reflect cpu capability and NO_MSA software/env	2023-01-02 22:19:13 +01:00
Xianyi Zhang	e5313f53d5	Merge branch 'develop' of https://github.com/HellerZheng/OpenBLAS_riscv_x280 into HellerZheng-develop	2022-12-03 12:00:52 +08:00
Chris Sidebottom	2fb096315e	Set SWITCH_RATIO for Arm(R) Neoverse(TM) V1 CPUs From testing this yields better results than the default of `2`.	2022-11-30 09:35:38 +00:00
Heller Zheng	bef47917bd	Initial version for riscv sifive x280	2022-11-15 00:06:25 -08:00
Honglin Zhu	4989e039a5	Define SBGEMM_ALIGN_K for DYNAMIC_ARCH build	2022-10-27 14:10:26 +08:00
Jiaxun Yang	a50b29c540	Provide a fallback MIPS64_GENERIC target It is really dangerous to fallback to Loongson core on other MIPS64 processors. Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com>	2022-08-12 13:13:28 +01:00
gxw	fbfe1daf6e	LoongArch64: Add DYNAMIC_ARCH support	2022-07-28 14:28:45 +08:00
gxw	3573306a69	LoongArch64: Add core LOONGSON2K1000 and LOONGSONGENERIC	2022-07-25 16:04:56 +08:00

1 2 3 4 5 ...

298 Commits