Martin Kroeker
							
						 
						
							 
							
							
								
								
							
							
							
								
							
							
								72caceb324 
								
							 
						 
						
							
							
								
								Merge pull request  #4009  from Mousius/sve-gemm  
							
							 
							
							... 
							
							
							
							Use SVE kernel for SGEMM/DGEMM on Arm(R) Neoverse(TM) V1 
							
						 
						
							2023-04-22 13:56:45 +02:00  
						
					 
				
					
						
							
							
								 
								Martin Kroeker
							
						 
						
							 
							
							
								
								
							
							
							
								
							
							
								437c0bf2b4 
								
							 
						 
						
							
							
								
								Merge pull request  #3843  from Mousius/switch-ratio  
							
							 
							
							... 
							
							
							
							Propagate SWITCH_RATIO to DYNAMIC_ARCH builds 
							
						 
						
							2023-04-19 11:51:54 +02:00  
						
					 
				
					
						
							
							
								 
								Chris Sidebottom
							
						 
						
							 
							
							
							
							
								
							
							
								ec334e69dc 
								
							 
						 
						
							
							
								
								Use SVE kernel for SGEMM/DGEMM on Arm(R) Neoverse(TM) V1  
							
							 
							
							... 
							
							
							
							This re-spins #3869  with some additional copy unrolling which helps maintain SYRK performance.
After #3868 , the SVE kernels represent a pretty good boost.
This re-uses ARMV8SVE as a base and I'm going to incrementally move everything to use ARMV8SVE in additional patches (as well as fix up anything that's not already in ARMV8SVE). 
							
						 
						
							2023-04-17 17:38:42 +01:00  
						
					 
				
					
						
							
							
								 
								Chris Sidebottom
							
						 
						
							 
							
							
							
							
								
							
							
								5b165420b5 
								
							 
						 
						
							
							
								
								SWITCH_RATIO for Arm(R) Neoverse(TM) architecture  
							
							 
							
							... 
							
							
							
							This seems like a good balance of values for reasonably sized matrices. With `SWITCH_RATIO=16` the DGEMM scales better to bigger sizes but the better solution would be some kind of
thread throttling so I've gone with `SWITCH_RATIO=8`. 
							
						 
						
							2023-04-17 15:42:55 +01:00  
						
					 
				
					
						
							
							
								 
								Chris Sidebottom
							
						 
						
							 
							
							
							
							
								
							
							
								32f2fafde7 
								
							 
						 
						
							
							
								
								Propagate SWITCH_RATIO to DYNAMIC_ARCH builds  
							
							 
							
							... 
							
							
							
							Previously dynamic builds were either using the default SWITCH_RATIO
or one from the higher level architecture; this patch ensures the
dynamic builds can use this parameter as well. 
							
						 
						
							2023-04-17 15:34:12 +01:00  
						
					 
				
					
						
							
							
								 
								Martin Kroeker
							
						 
						
							 
							
							
								
								
							
							
							
								
							
							
								31fd13d048 
								
							 
						 
						
							
							
								
								MIPS: make HAVE_MSA reflect cpu capability and NO_MSA software/env  
							
							 
							
							
							
						 
						
							2023-01-02 22:19:13 +01:00  
						
					 
				
					
						
							
							
								 
								Chris Sidebottom
							
						 
						
							 
							
							
							
							
								
							
							
								2fb096315e 
								
							 
						 
						
							
							
								
								Set SWITCH_RATIO for Arm(R) Neoverse(TM) V1 CPUs  
							
							 
							
							... 
							
							
							
							From testing this yields better results than the default of `2`. 
							
						 
						
							2022-11-30 09:35:38 +00:00  
						
					 
				
					
						
							
							
								 
								Honglin Zhu
							
						 
						
							 
							
							
							
							
								
							
							
								4989e039a5 
								
							 
						 
						
							
							
								
								Define SBGEMM_ALIGN_K for DYNAMIC_ARCH build  
							
							 
							
							
							
						 
						
							2022-10-27 14:10:26 +08:00  
						
					 
				
					
						
							
							
								 
								Jiaxun Yang
							
						 
						
							 
							
							
							
							
								
							
							
								a50b29c540 
								
							 
						 
						
							
							
								
								Provide a fallback MIPS64_GENERIC target  
							
							 
							
							... 
							
							
							
							It is really dangerous to fallback to Loongson core on other
MIPS64 processors.
Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com> 
							
						 
						
							2022-08-12 13:13:28 +01:00  
						
					 
				
					
						
							
							
								 
								gxw
							
						 
						
							 
							
							
							
							
								
							
							
								fbfe1daf6e 
								
							 
						 
						
							
							
								
								LoongArch64: Add DYNAMIC_ARCH support  
							
							 
							
							
							
						 
						
							2022-07-28 14:28:45 +08:00  
						
					 
				
					
						
							
							
								 
								gxw
							
						 
						
							 
							
							
							
							
								
							
							
								3573306a69 
								
							 
						 
						
							
							
								
								LoongArch64: Add core LOONGSON2K1000 and LOONGSONGENERIC  
							
							 
							
							
							
						 
						
							2022-07-25 16:04:56 +08:00  
						
					 
				
					
						
							
							
								 
								Honglin Zhu
							
						 
						
							 
							
							
							
							
								
							
							
								123e0dfb62 
								
							 
						 
						
							
							
								
								Neoverse N2 sbgemm:  
							
							 
							
							... 
							
							
							
							1. Modify the algorithm to resolve multithreading failures
    2. No memory allocation in sbgemm kernel
    3. Optimize when alpha == 1.0f 
							
						 
						
							2022-06-29 10:14:21 +08:00  
						
					 
				
					
						
							
							
								 
								Honglin Zhu
							
						 
						
							 
							
							
							
							
								
							
							
								55d686d41e 
								
							 
						 
						
							
							
								
								neoverse n2 sbgemm:  
							
							 
							
							... 
							
							
							
							implement ncopy tcopy kernel_8x4 
							
						 
						
							2022-06-29 10:14:21 +08:00  
						
					 
				
					
						
							
							
								 
								Martin Kroeker
							
						 
						
							 
							
							
								
								
							
							
							
								
							
							
								dac14a5f7d 
								
							 
						 
						
							
							
								
								revert "switch DGEMM parameters for SkylakeX if DYNAMIC_ARCH"  
							
							 
							
							
							
						 
						
							2022-05-20 11:28:23 +02:00  
						
					 
				
					
						
							
							
								 
								Martin Kroeker
							
						 
						
							 
							
							
								
								
							
							
							
								
							
							
								a55a06c269 
								
							 
						 
						
							
							
								
								Update param.h  
							
							 
							
							
							
						 
						
							2022-03-28 18:10:08 +02:00  
						
					 
				
					
						
							
							
								 
								Martin Kroeker
							
						 
						
							 
							
							
								
								
							
							
							
								
							
							
								d93cf7f23c 
								
							 
						 
						
							
							
								
								fix defines for CORTEX-X  
							
							 
							
							
							
						 
						
							2022-03-28 17:37:06 +02:00  
						
					 
				
					
						
							
							
								 
								Martin Kroeker
							
						 
						
							 
							
							
								
								
							
							
							
								
							
							
								09b8545fc5 
								
							 
						 
						
							
							
								
								Add initial support for M1 on Linux, Phytium FT2xxx series, ARM Cortex 510/710/X1/X2  
							
							 
							
							
							
						 
						
							2022-03-27 15:24:40 +02:00  
						
					 
				
					
						
							
							
								 
								Martin Kroeker
							
						 
						
							 
							
							
								
								
							
							
							
								
							
							
								8d0f7f0176 
								
							 
						 
						
							
							
								
								Revert accidental change of generic ARMV8 DGEMM parameters from  #3425  
							
							 
							
							
							
						 
						
							2022-03-27 13:10:47 +02:00  
						
					 
				
					
						
							
							
								 
								Martin Kroeker
							
						 
						
							 
							
							
								
								
							
							
							
								
							
							
								c1c0d5ce1d 
								
							 
						 
						
							
							
								
								Merge pull request  #3492  from binebrank/arm_sve_zgemm  
							
							 
							
							... 
							
							
							
							SVE zgemm&cgemm (and other BLAS 3 complex) 
							
						 
						
							2022-01-18 21:36:33 +01:00  
						
					 
				
					
						
							
							
								 
								Bine Brank
							
						 
						
							 
							
							
							
							
								
							
							
								b6a445cfd8 
								
							 
						 
						
							
							
								
								adapt Makefile for SVE trsm  
							
							 
							
							
							
						 
						
							2022-01-16 21:40:56 +01:00  
						
					 
				
					
						
							
							
								 
								Martin Kroeker
							
						 
						
							 
							
							
								
								
							
							
							
								
							
							
								499ae5e8f7 
								
							 
						 
						
							
							
								
								Merge pull request  #3510  from martin-frbg/issue3505  
							
							 
							
							... 
							
							
							
							Fix recent SkylakeX/DYNAMIC_ARCH DGEMM breakage 
							
						 
						
							2022-01-09 14:50:51 +01:00  
						
					 
				
					
						
							
							
								 
								Martin Kroeker
							
						 
						
							 
							
							
								
								
							
							
							
								
							
							
								b6b024232d 
								
							 
						 
						
							
							
								
								Merge pull request  #3508  from snadampal/v1_n2  
							
							 
							
							... 
							
							
							
							OpenBLAS: aarch64: Add neoverse-v1/n2 architecture specifics 
							
						 
						
							2022-01-09 14:50:26 +01:00  
						
					 
				
					
						
							
							
								 
								Martin Kroeker
							
						 
						
							 
							
							
								
								
							
							
							
								
							
							
								15d4b37913 
								
							 
						 
						
							
							
								
								SkylakeX: match parameters to dgemm kernels for dyn/non-dyn  
							
							 
							
							
							
						 
						
							2022-01-08 23:48:13 +01:00  
						
					 
				
					
						
							
							
								 
								Sunita Nadampalli
							
						 
						
							 
							
							
							
							
								
							
							
								19c8f615dc 
								
							 
						 
						
							
							
								
								OpenBLAS: aarch64: Add neoverse-v1/n2 architecture specifics  
							
							 
							
							
							
						 
						
							2022-01-07 00:28:17 +00:00  
						
					 
				
					
						
							
							
								 
								Bine Brank
							
						 
						
							 
							
							
							
							
								
							
							
								39ab219704 
								
							 
						 
						
							
							
								
								sve copy functions for cgemm chemm zsymm  
							
							 
							
							
							
						 
						
							2022-01-05 09:12:22 +01:00  
						
					 
				
					
						
							
							
								 
								gxw
							
						 
						
							 
							
							
							
							
								
							
							
								8d9b9c6b2a 
								
							 
						 
						
							
							
								
								loongarch64: Optimize dgemm_kernel  
							
							 
							
							
							
						 
						
							2021-12-21 09:33:06 +08:00  
						
					 
				
					
						
							
							
								 
								Martin Kroeker
							
						 
						
							 
							
							
								
								
							
							
							
								
							
							
								697e2752d7 
								
							 
						 
						
							
							
								
								Merge pull request  #3464  from binebrank/arm_sve_sgemm  
							
							 
							
							... 
							
							
							
							Add sgemm part for Arm SVE 
							
						 
						
							2021-12-11 20:35:22 +01:00  
						
					 
				
					
						
							
							
								 
								Bine Brank
							
						 
						
							 
							
							
							
							
								
							
							
								a8f62a347b 
								
							 
						 
						
							
							
								
								fix UNROLL_MN and add to targets for SVE  
							
							 
							
							
							
						 
						
							2021-12-11 16:37:23 +01:00  
						
					 
				
					
						
							
							
								 
								Martin Kroeker
							
						 
						
							 
							
							
								
								
							
							
							
								
							
							
								f7f7fea0dc 
								
							 
						 
						
							
							
								
								Merge pull request  #3472  from kavanabhat/p10_aixas_p8  
							
							 
							
							... 
							
							
							
							Fallback for Power kernels 
							
						 
						
							2021-12-09 07:28:57 +01:00  
						
					 
				
					
						
							
							
								 
								kavanabhat
							
						 
						
							 
							
							
							
							
								
							
							
								eee3381cbe 
								
							 
						 
						
							
							
								
								Fallback for Power kernels  
							
							 
							
							
							
						 
						
							2021-12-08 03:52:23 -06:00  
						
					 
				
					
						
							
							
								 
								Martin Kroeker
							
						 
						
							 
							
							
								
								
							
							
							
								
							
							
								dd1f645371 
								
							 
						 
						
							
							
								
								switch DGEMM unroll parameters for SkylakeX if DYNAMIC_ARCH  
							
							 
							
							
							
						 
						
							2021-12-06 19:42:51 +01:00  
						
					 
				
					
						
							
							
								 
								Bine Brank
							
						 
						
							 
							
							
							
							
								
							
							
								86ae89bf33 
								
							 
						 
						
							
							
								
								add sgemm kernel and copy functions for sgemm and ssymm  
							
							 
							
							
							
						 
						
							2021-11-28 18:12:47 +01:00  
						
					 
				
					
						
							
							
								 
								Martin Kroeker
							
						 
						
							 
							
							
								
								
							
							
							
								
							
							
								454edd741c 
								
							 
						 
						
							
							
								
								Merge pull request  #3425  from binebrank/arm_sve_dgemm  
							
							 
							
							... 
							
							
							
							Add dgemm kernel for arm64 SVE 
							
						 
						
							2021-11-26 16:14:55 +01:00  
						
					 
				
					
						
							
							
								 
								Bine Brank
							
						 
						
							 
							
							
							
							
								
							
							
								f4da23dcb6 
								
							 
						 
						
							
							
								
								reduced dgemm_unroll_m to work with 128-bit sve  
							
							 
							
							
							
						 
						
							2021-11-23 21:18:08 +01:00  
						
					 
				
					
						
							
							
								 
								Bine Brank
							
						 
						
							 
							
							
							
							
								
							
							
								9388f05a3c 
								
							 
						 
						
							
							
								
								configure SVE Makefile  
							
							 
							
							
							
						 
						
							2021-11-21 18:33:43 +01:00  
						
					 
				
					
						
							
							
								 
								Martin Kroeker
							
						 
						
							 
							
							
								
								
							
							
							
								
							
							
								52a3f004a0 
								
							 
						 
						
							
							
								
								Fix unintended reversion of recent CortexA53 changes  
							
							 
							
							
							
						 
						
							2021-11-20 23:54:48 +01:00  
						
					 
				
					
						
							
							
								 
								Martin Kroeker
							
						 
						
							 
							
							
								
								
							
							
							
								
							
							
								19ccef5fb1 
								
							 
						 
						
							
							
								
								Add generic MIPS32 target  
							
							 
							
							
							
						 
						
							2021-11-20 17:31:11 +01:00  
						
					 
				
					
						
							
							
								 
								Jia-Chen
							
						 
						
							 
							
							
							
							
								
							
							
								302f22693a 
								
							 
						 
						
							
							
								
								MOD: optimize normal DGEMM on ARMV8 cortex-A53 & cortex-A55  
							
							 
							
							
							
						 
						
							2021-11-18 21:14:43 +08:00  
						
					 
				
					
						
							
							
								 
								Martin Kroeker
							
						 
						
							 
							
							
								
								
							
							
							
								
							
							
								46947efb83 
								
							 
						 
						
							
							
								
								Ignore compiler support for MIPS MSA if the cpu lacks this capability  
							
							 
							
							
							
						 
						
							2021-11-13 23:32:26 +01:00  
						
					 
				
					
						
							
							
								 
								Bine Brank
							
						 
						
							 
							
							
							
							
								
							
							
								ab7917910d 
								
							 
						 
						
							
							
								
								add v2x8 kernel + fix sve dtrmm  
							
							 
							
							
							
						 
						
							2021-11-07 20:37:51 +01:00  
						
					 
				
					
						
							
							
								 
								Bine Brank
							
						 
						
							 
							
							
							
							
								
							
							
								7093372e32 
								
							 
						 
						
							
							
								
								add ARMV8SVE target  
							
							 
							
							
							
						 
						
							2021-11-01 22:53:21 +01:00  
						
					 
				
					
						
							
							
								 
								Wangyang Guo
							
						 
						
							 
							
							
							
							
								
							
							
								7b2f5cb3b7 
								
							 
						 
						
							
							
								
								sbgemm: spr: enlarge P to 256 for performance  
							
							 
							
							
							
						 
						
							2021-10-17 19:08:03 -07:00  
						
					 
				
					
						
							
							
								 
								Wangyang Guo
							
						 
						
							 
							
							
							
							
								
							
							
								0abbcd19c1 
								
							 
						 
						
							
							
								
								sbgemm: spr: tuning for blocking params  
							
							 
							
							
							
						 
						
							2021-10-17 19:08:03 -07:00  
						
					 
				
					
						
							
							
								 
								Wangyang Guo
							
						 
						
							 
							
							
							
							
								
							
							
								3dc6052c7e 
								
							 
						 
						
							
							
								
								initial support for Sapphire Rapids platform  
							
							 
							
							
							
						 
						
							2021-10-12 01:30:40 -07:00  
						
					 
				
					
						
							
							
								 
								Martin Kroeker
							
						 
						
							 
							
							
								
								
							
							
							
								
							
							
								24233b7c49 
								
							 
						 
						
							
							
								
								Use "big arm server"  GEMM defaults for Vortex  
							
							 
							
							
							
						 
						
							2021-10-06 11:10:19 +02:00  
						
					 
				
					
						
							
							
								 
								kavanabhat
							
						 
						
							 
							
							
							
							
								
							
							
								fe3c778c51 
								
							 
						 
						
							
							
								
								AIX changes for P10 with GNU Compiler  
							
							 
							
							
							
						 
						
							2021-09-30 06:06:27 -05:00  
						
					 
				
					
						
							
							
								 
								Wangyang Guo
							
						 
						
							 
							
							
							
							
								
							
							
								8356a604f0 
								
							 
						 
						
							
							
								
								sbgemm: cooperlake: tuning for block params  
							
							 
							
							
							
						 
						
							2021-09-07 21:30:46 +08:00  
						
					 
				
					
						
							
							
								 
								Niyas Sait
							
						 
						
							 
							
							
							
							
								
							
							
								7cddbf99b1 
								
							 
						 
						
							
							
								
								Make explicit conversion condition on _WIN64 flag  
							
							 
							
							
							
						 
						
							2021-08-31 14:36:44 +01:00  
						
					 
				
					
						
							
							
								 
								Niyas Sait
							
						 
						
							 
							
							
							
							
								
							
							
								d1ed72fa87 
								
							 
						 
						
							
							
								
								[win/arm64]: Explicit casting for GMEMM_DEFAULT_ALIGN to create 64-bit value  
							
							 
							
							... 
							
							
							
							Win64 uses LLP64 datamodel and unsigned long is only 32-bit. For 64-bit
architecture we need 64-bit mask to correctly generate address 
							
						 
						
							2021-08-31 11:56:10 +01:00  
						
					 
				
					
						
							
							
								 
								gxw
							
						 
						
							 
							
							
							
							
								
							
							
								af0a69f355 
								
							 
						 
						
							
							
								
								Add support for LOONGARCH64  
							
							 
							
							
							
						 
						
							2021-07-27 15:29:12 +08:00