84a268b6ca 
								
							 
						 
						
							
							
								
								Use SVE zgemm/cgemm on Arm(R) Neoverse(TM) V1 core  
							
							... 
							
							
							
							This patch removes the prefetches from cgemm/zgemm which improves the performance similar to sgemm/dgemm did in #3868 , this means I'm happy to enable this on any applicable cores.
I also replicated the unrolling the copies from sgemm and dgemm. 
							
						 
						
							2023-07-27 14:12:20 +01:00  
				
					
						
							
							
								 
						
							
								f971ef55f2 
								
							 
						 
						
							
							
								
								Add ARMV8SVE to AArch64 Dynamic Dispatch  
							
							... 
							
							
							
							In order to enable support for future cores which have similar tunings
(in this case I'm doing this for the Arm(R) Neoverse(TM) V2 core), this generically detects SVE support and enables it. This should better manage the size and complexity of dynamic dispatch rather than just copy pasting the same parameters.
To make `ARMV8SVE` more representive of the common 128-bit SVE case,
I've split it and similar parameters from A64FX which has the wider
512-bit SVE. 
							
						 
						
							2023-07-25 18:35:15 +01:00  
				
					
						
							
							
								 
						
							
								72caceb324 
								
							 
						 
						
							
							
								
								Merge pull request  #4009  from Mousius/sve-gemm  
							
							... 
							
							
							
							Use SVE kernel for SGEMM/DGEMM on Arm(R) Neoverse(TM) V1 
							
						 
						
							2023-04-22 13:56:45 +02:00  
				
					
						
							
							
								 
						
							
								437c0bf2b4 
								
							 
						 
						
							
							
								
								Merge pull request  #3843  from Mousius/switch-ratio  
							
							... 
							
							
							
							Propagate SWITCH_RATIO to DYNAMIC_ARCH builds 
							
						 
						
							2023-04-19 11:51:54 +02:00  
				
					
						
							
							
								 
						
							
								ec334e69dc 
								
							 
						 
						
							
							
								
								Use SVE kernel for SGEMM/DGEMM on Arm(R) Neoverse(TM) V1  
							
							... 
							
							
							
							This re-spins #3869  with some additional copy unrolling which helps maintain SYRK performance.
After #3868 , the SVE kernels represent a pretty good boost.
This re-uses ARMV8SVE as a base and I'm going to incrementally move everything to use ARMV8SVE in additional patches (as well as fix up anything that's not already in ARMV8SVE). 
							
						 
						
							2023-04-17 17:38:42 +01:00  
				
					
						
							
							
								 
						
							
								5b165420b5 
								
							 
						 
						
							
							
								
								SWITCH_RATIO for Arm(R) Neoverse(TM) architecture  
							
							... 
							
							
							
							This seems like a good balance of values for reasonably sized matrices. With `SWITCH_RATIO=16` the DGEMM scales better to bigger sizes but the better solution would be some kind of
thread throttling so I've gone with `SWITCH_RATIO=8`. 
							
						 
						
							2023-04-17 15:42:55 +01:00  
				
					
						
							
							
								 
						
							
								32f2fafde7 
								
							 
						 
						
							
							
								
								Propagate SWITCH_RATIO to DYNAMIC_ARCH builds  
							
							... 
							
							
							
							Previously dynamic builds were either using the default SWITCH_RATIO
or one from the higher level architecture; this patch ensures the
dynamic builds can use this parameter as well. 
							
						 
						
							2023-04-17 15:34:12 +01:00  
				
					
						
							
							
								 
						
							
								31fd13d048 
								
							 
						 
						
							
							
								
								MIPS: make HAVE_MSA reflect cpu capability and NO_MSA software/env  
							
							
							
						 
						
							2023-01-02 22:19:13 +01:00  
				
					
						
							
							
								 
						
							
								2fb096315e 
								
							 
						 
						
							
							
								
								Set SWITCH_RATIO for Arm(R) Neoverse(TM) V1 CPUs  
							
							... 
							
							
							
							From testing this yields better results than the default of `2`. 
							
						 
						
							2022-11-30 09:35:38 +00:00  
				
					
						
							
							
								 
						
							
								4989e039a5 
								
							 
						 
						
							
							
								
								Define SBGEMM_ALIGN_K for DYNAMIC_ARCH build  
							
							
							
						 
						
							2022-10-27 14:10:26 +08:00  
				
					
						
							
							
								 
						
							
								a50b29c540 
								
							 
						 
						
							
							
								
								Provide a fallback MIPS64_GENERIC target  
							
							... 
							
							
							
							It is really dangerous to fallback to Loongson core on other
MIPS64 processors.
Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com> 
							
						 
						
							2022-08-12 13:13:28 +01:00  
				
					
						
							
							
								 
						
							
								fbfe1daf6e 
								
							 
						 
						
							
							
								
								LoongArch64: Add DYNAMIC_ARCH support  
							
							
							
						 
						
							2022-07-28 14:28:45 +08:00  
				
					
						
							
							
								 
						
							
								3573306a69 
								
							 
						 
						
							
							
								
								LoongArch64: Add core LOONGSON2K1000 and LOONGSONGENERIC  
							
							
							
						 
						
							2022-07-25 16:04:56 +08:00  
				
					
						
							
							
								 
						
							
								123e0dfb62 
								
							 
						 
						
							
							
								
								Neoverse N2 sbgemm:  
							
							... 
							
							
							
							1. Modify the algorithm to resolve multithreading failures
    2. No memory allocation in sbgemm kernel
    3. Optimize when alpha == 1.0f 
							
						 
						
							2022-06-29 10:14:21 +08:00  
				
					
						
							
							
								 
						
							
								55d686d41e 
								
							 
						 
						
							
							
								
								neoverse n2 sbgemm:  
							
							... 
							
							
							
							implement ncopy tcopy kernel_8x4 
							
						 
						
							2022-06-29 10:14:21 +08:00  
				
					
						
							
							
								 
						
							
								dac14a5f7d 
								
							 
						 
						
							
							
								
								revert "switch DGEMM parameters for SkylakeX if DYNAMIC_ARCH"  
							
							
							
						 
						
							2022-05-20 11:28:23 +02:00  
				
					
						
							
							
								 
						
							
								a55a06c269 
								
							 
						 
						
							
							
								
								Update param.h  
							
							
							
						 
						
							2022-03-28 18:10:08 +02:00  
				
					
						
							
							
								 
						
							
								d93cf7f23c 
								
							 
						 
						
							
							
								
								fix defines for CORTEX-X  
							
							
							
						 
						
							2022-03-28 17:37:06 +02:00  
				
					
						
							
							
								 
						
							
								09b8545fc5 
								
							 
						 
						
							
							
								
								Add initial support for M1 on Linux, Phytium FT2xxx series, ARM Cortex 510/710/X1/X2  
							
							
							
						 
						
							2022-03-27 15:24:40 +02:00  
				
					
						
							
							
								 
						
							
								8d0f7f0176 
								
							 
						 
						
							
							
								
								Revert accidental change of generic ARMV8 DGEMM parameters from  #3425  
							
							
							
						 
						
							2022-03-27 13:10:47 +02:00  
				
					
						
							
							
								 
						
							
								c1c0d5ce1d 
								
							 
						 
						
							
							
								
								Merge pull request  #3492  from binebrank/arm_sve_zgemm  
							
							... 
							
							
							
							SVE zgemm&cgemm (and other BLAS 3 complex) 
							
						 
						
							2022-01-18 21:36:33 +01:00  
				
					
						
							
							
								 
						
							
								b6a445cfd8 
								
							 
						 
						
							
							
								
								adapt Makefile for SVE trsm  
							
							
							
						 
						
							2022-01-16 21:40:56 +01:00  
				
					
						
							
							
								 
						
							
								499ae5e8f7 
								
							 
						 
						
							
							
								
								Merge pull request  #3510  from martin-frbg/issue3505  
							
							... 
							
							
							
							Fix recent SkylakeX/DYNAMIC_ARCH DGEMM breakage 
							
						 
						
							2022-01-09 14:50:51 +01:00  
				
					
						
							
							
								 
						
							
								b6b024232d 
								
							 
						 
						
							
							
								
								Merge pull request  #3508  from snadampal/v1_n2  
							
							... 
							
							
							
							OpenBLAS: aarch64: Add neoverse-v1/n2 architecture specifics 
							
						 
						
							2022-01-09 14:50:26 +01:00  
				
					
						
							
							
								 
						
							
								15d4b37913 
								
							 
						 
						
							
							
								
								SkylakeX: match parameters to dgemm kernels for dyn/non-dyn  
							
							
							
						 
						
							2022-01-08 23:48:13 +01:00  
				
					
						
							
							
								 
						
							
								19c8f615dc 
								
							 
						 
						
							
							
								
								OpenBLAS: aarch64: Add neoverse-v1/n2 architecture specifics  
							
							
							
						 
						
							2022-01-07 00:28:17 +00:00  
				
					
						
							
							
								 
						
							
								39ab219704 
								
							 
						 
						
							
							
								
								sve copy functions for cgemm chemm zsymm  
							
							
							
						 
						
							2022-01-05 09:12:22 +01:00  
				
					
						
							
							
								 
						
							
								8d9b9c6b2a 
								
							 
						 
						
							
							
								
								loongarch64: Optimize dgemm_kernel  
							
							
							
						 
						
							2021-12-21 09:33:06 +08:00  
				
					
						
							
							
								 
						
							
								697e2752d7 
								
							 
						 
						
							
							
								
								Merge pull request  #3464  from binebrank/arm_sve_sgemm  
							
							... 
							
							
							
							Add sgemm part for Arm SVE 
							
						 
						
							2021-12-11 20:35:22 +01:00  
				
					
						
							
							
								 
						
							
								a8f62a347b 
								
							 
						 
						
							
							
								
								fix UNROLL_MN and add to targets for SVE  
							
							
							
						 
						
							2021-12-11 16:37:23 +01:00  
				
					
						
							
							
								 
						
							
								f7f7fea0dc 
								
							 
						 
						
							
							
								
								Merge pull request  #3472  from kavanabhat/p10_aixas_p8  
							
							... 
							
							
							
							Fallback for Power kernels 
							
						 
						
							2021-12-09 07:28:57 +01:00  
				
					
						
							
							
								 
						
							
								eee3381cbe 
								
							 
						 
						
							
							
								
								Fallback for Power kernels  
							
							
							
						 
						
							2021-12-08 03:52:23 -06:00  
				
					
						
							
							
								 
						
							
								dd1f645371 
								
							 
						 
						
							
							
								
								switch DGEMM unroll parameters for SkylakeX if DYNAMIC_ARCH  
							
							
							
						 
						
							2021-12-06 19:42:51 +01:00  
				
					
						
							
							
								 
						
							
								86ae89bf33 
								
							 
						 
						
							
							
								
								add sgemm kernel and copy functions for sgemm and ssymm  
							
							
							
						 
						
							2021-11-28 18:12:47 +01:00  
				
					
						
							
							
								 
						
							
								454edd741c 
								
							 
						 
						
							
							
								
								Merge pull request  #3425  from binebrank/arm_sve_dgemm  
							
							... 
							
							
							
							Add dgemm kernel for arm64 SVE 
							
						 
						
							2021-11-26 16:14:55 +01:00  
				
					
						
							
							
								 
						
							
								f4da23dcb6 
								
							 
						 
						
							
							
								
								reduced dgemm_unroll_m to work with 128-bit sve  
							
							
							
						 
						
							2021-11-23 21:18:08 +01:00  
				
					
						
							
							
								 
						
							
								9388f05a3c 
								
							 
						 
						
							
							
								
								configure SVE Makefile  
							
							
							
						 
						
							2021-11-21 18:33:43 +01:00  
				
					
						
							
							
								 
						
							
								52a3f004a0 
								
							 
						 
						
							
							
								
								Fix unintended reversion of recent CortexA53 changes  
							
							
							
						 
						
							2021-11-20 23:54:48 +01:00  
				
					
						
							
							
								 
						
							
								19ccef5fb1 
								
							 
						 
						
							
							
								
								Add generic MIPS32 target  
							
							
							
						 
						
							2021-11-20 17:31:11 +01:00  
				
					
						
							
							
								 
						
							
								302f22693a 
								
							 
						 
						
							
							
								
								MOD: optimize normal DGEMM on ARMV8 cortex-A53 & cortex-A55  
							
							
							
						 
						
							2021-11-18 21:14:43 +08:00  
				
					
						
							
							
								 
						
							
								46947efb83 
								
							 
						 
						
							
							
								
								Ignore compiler support for MIPS MSA if the cpu lacks this capability  
							
							
							
						 
						
							2021-11-13 23:32:26 +01:00  
				
					
						
							
							
								 
						
							
								ab7917910d 
								
							 
						 
						
							
							
								
								add v2x8 kernel + fix sve dtrmm  
							
							
							
						 
						
							2021-11-07 20:37:51 +01:00  
				
					
						
							
							
								 
						
							
								7093372e32 
								
							 
						 
						
							
							
								
								add ARMV8SVE target  
							
							
							
						 
						
							2021-11-01 22:53:21 +01:00  
				
					
						
							
							
								 
						
							
								7b2f5cb3b7 
								
							 
						 
						
							
							
								
								sbgemm: spr: enlarge P to 256 for performance  
							
							
							
						 
						
							2021-10-17 19:08:03 -07:00  
				
					
						
							
							
								 
						
							
								0abbcd19c1 
								
							 
						 
						
							
							
								
								sbgemm: spr: tuning for blocking params  
							
							
							
						 
						
							2021-10-17 19:08:03 -07:00  
				
					
						
							
							
								 
						
							
								3dc6052c7e 
								
							 
						 
						
							
							
								
								initial support for Sapphire Rapids platform  
							
							
							
						 
						
							2021-10-12 01:30:40 -07:00  
				
					
						
							
							
								 
						
							
								24233b7c49 
								
							 
						 
						
							
							
								
								Use "big arm server"  GEMM defaults for Vortex  
							
							
							
						 
						
							2021-10-06 11:10:19 +02:00  
				
					
						
							
							
								 
						
							
								fe3c778c51 
								
							 
						 
						
							
							
								
								AIX changes for P10 with GNU Compiler  
							
							
							
						 
						
							2021-09-30 06:06:27 -05:00  
				
					
						
							
							
								 
						
							
								8356a604f0 
								
							 
						 
						
							
							
								
								sbgemm: cooperlake: tuning for block params  
							
							
							
						 
						
							2021-09-07 21:30:46 +08:00  
				
					
						
							
							
								 
						
							
								7cddbf99b1 
								
							 
						 
						
							
							
								
								Make explicit conversion condition on _WIN64 flag  
							
							
							
						 
						
							2021-08-31 14:36:44 +01:00