675cd551da 
								
							 
						 
						
							
							
								
								fix improper function prototypes (empty parentheses)  
							
							
							
						 
						
							2023-09-30 12:56:38 +02:00  
				
					
						
							
							
								 
						
							
								d15e0a055c 
								
							 
						 
						
							
							
								
								LoongArch64: Fixed compilation issues when enable DYNAMIC_ARCH  
							
							
							
						 
						
							2023-09-27 10:05:27 +08:00  
				
					
						
							
							
								 
						
							
								4670eb1462 
								
							 
						 
						
							
							
								
								LoongArch64: Add dtrsm kernel  
							
							
							
						 
						
							2023-09-26 15:45:14 +08:00  
				
					
						
							
							
								 
						
							
								f2cf929374 
								
							 
						 
						
							
							
								
								LoongArch64: Add sgemv kernel  
							
							
							
						 
						
							2023-09-04 14:28:37 +08:00  
				
					
						
							
							
								 
						
							
								8e6d93359d 
								
							 
						 
						
							
							
								
								Merge pull request  #4196  from TiborGY/obsolete_inlines  
							
							... 
							
							
							
							Modernize obsolete inline order 
							
						 
						
							2023-09-03 14:12:42 +02:00  
				
					
						
							
							
								 
						
							
								394a1fd1bf 
								
							 
						 
						
							
							
								
								LoongArch64: Compatible with early internal toolchain  
							
							... 
							
							
							
							__loongarch_grlen and __loongarch_frlen were introduced in gcc version 8.3.0
(Loongnix 8.3.0-6.lnd.vec.31) internally within Loongson to standardize the
general and floating-point register widths. However, previous versions did
not have them, requiring additional checks to be added. 
							
						 
						
							2023-08-31 16:55:29 +08:00  
				
					
						
							
							
								 
						
							
								9c4ae4d4fb 
								
							 
						 
						
							
							
								
								Merge pull request  #4206  from martin-frbg/issue4201-2  
							
							... 
							
							
							
							Work around miscompilation of zdot_thunderx2t99 by the current NVIDIA HPC compiler 
							
						 
						
							2023-08-26 10:17:27 +02:00  
				
					
						
							
							
								 
						
							
								88435104c8 
								
							 
						 
						
							
							
								
								Merge pull request  #4204  from martin-frbg/llvm17-2  
							
							... 
							
							
							
							Work around LLVM17 miscompiling the AVX512 microkernels for CASUM/ZASUM 
							
						 
						
							2023-08-26 00:32:18 +02:00  
				
					
						
							
							
								 
						
							
								fc8894dd98 
								
							 
						 
						
							
							
								
								Workaround miscompilation by NVIDIA nvc  
							
							
							
						 
						
							2023-08-26 00:30:17 +02:00  
				
					
						
							
							
								 
						
							
								7a6203ffa1 
								
							 
						 
						
							
							
								
								restore default Neoverse SVE build instructions for non-NVIDIA compilers  
							
							
							
						 
						
							2023-08-25 18:25:51 +02:00  
				
					
						
							
							
								 
						
							
								2c3034ff7f 
								
							 
						 
						
							
							
								
								Disable the C/ZASUM AVX512 microkernels when compiling with LLVM17 as well  
							
							
							
						 
						
							2023-08-25 17:22:51 +02:00  
				
					
						
							
							
								 
						
							
								8794544b43 
								
							 
						 
						
							
							
								
								Add support for compiling the Neoverse SVE kernels with the NVIDIA HPC compiler  
							
							
							
						 
						
							2023-08-25 16:47:32 +02:00  
				
					
						
							
							
								 
						
							
								553cc1372f 
								
							 
						 
						
							
							
								
								LoongArch64: Add sgemm_kernel  
							
							
							
						 
						
							2023-08-23 16:08:43 +08:00  
				
					
						
							
							
								 
						
							
								12ede72ab7 
								
							 
						 
						
							
							
								
								Merge pull request  #4192  from imciner2/im/clangfix  
							
							... 
							
							
							
							Fix cooperlake and sapphire rapids march flags on clang 
							
						 
						
							2023-08-21 15:46:35 +02:00  
				
					
						
							
							
								 
						
							
								79c15db348 
								
							 
						 
						
							
							
								
								Fix power10 gcc intrinsic check  
							
							... 
							
							
							
							__builtin_vsx_assemble_pair was only in GCC 10-11.2 and was replaced by
__builtin_vsx_build_pair thereafter. 
							
						 
						
							2023-08-17 15:05:29 +01:00  
				
					
						
							
							
								 
						
							
								b5ba95a6c0 
								
							 
						 
						
							
							
								
								Modernize obsolete inline order  
							
							
							
						 
						
							2023-08-16 00:48:40 +02:00  
				
					
						
							
							
								 
						
							
								8a8a8479be 
								
							 
						 
						
							
							
								
								Fix cooperlake and sapphire rapids march flags on clang  
							
							... 
							
							
							
							The march=cooperlake and march=sapphirerapids flags were never getting
added when building with Clang targetting those architectures. Instead
it was falling back to the skylake AVX512 implementation.
Clang added support for these two architectures in Clang 9 and Clang 12,
so introduce new checks for those versions to enable the appropriate
march flag, and fallback to skylake otherwise. 
							
						 
						
							2023-08-14 16:12:35 +01:00  
				
					
						
							
							
								 
						
							
								34da1a067d 
								
							 
						 
						
							
							
								
								Allow negative INCX (API change from version 3.10 of the reference implementation)  
							
							
							
						 
						
							2023-08-10 17:01:50 +02:00  
				
					
						
							
							
								 
						
							
								07e32c4cb8 
								
							 
						 
						
							
							
								
								Allow negative INCX (API change from version 3.10 of the reference implementation)  
							
							
							
						 
						
							2023-08-10 17:00:18 +02:00  
				
					
						
							
							
								 
						
							
								c211da0688 
								
							 
						 
						
							
							
								
								Allow negative INCX (API change from version 3.10 of the reference implementation)  
							
							
							
						 
						
							2023-08-10 16:58:57 +02:00  
				
					
						
							
							
								 
						
							
								a34a0a7abc 
								
							 
						 
						
							
							
								
								Allow negative INCX (API change from version 3.10 of the reference implementation)  
							
							
							
						 
						
							2023-08-10 16:56:52 +02:00  
				
					
						
							
							
								 
						
							
								54d3246fc6 
								
							 
						 
						
							
							
								
								Allow negative INCX (API change from version 3.10 of the reference implementation)  
							
							
							
						 
						
							2023-08-10 16:55:17 +02:00  
				
					
						
							
							
								 
						
							
								7dd441d5db 
								
							 
						 
						
							
							
								
								Allow negative INCX (API change from version 3.10 of the reference implementation)  
							
							
							
						 
						
							2023-08-10 16:53:33 +02:00  
				
					
						
							
							
								 
						
							
								f692178792 
								
							 
						 
						
							
							
								
								Allow negative INCX (API change from version 3.10 of the reference implementation)  
							
							
							
						 
						
							2023-08-10 16:52:09 +02:00  
				
					
						
							
							
								 
						
							
								d15ffb7fdf 
								
							 
						 
						
							
							
								
								Allow negative INCX (API change from version 3.10 of the reference implementation)  
							
							
							
						 
						
							2023-08-10 16:50:44 +02:00  
				
					
						
							
							
								 
						
							
								a2d867f4d1 
								
							 
						 
						
							
							
								
								Allow negative iNCX (API change from version 3.10 of the reference implementation)  
							
							
							
						 
						
							2023-08-10 16:49:05 +02:00  
				
					
						
							
							
								 
						
							
								afdc56a421 
								
							 
						 
						
							
							
								
								Merge pull request  #4158  from XiWeiGu/loongarch64_update_dgemm_kernel  
							
							... 
							
							
							
							LoongArch64: Update dgemm kernel 
							
						 
						
							2023-08-07 12:44:09 +02:00  
				
					
						
							
							
								 
						
							
								e8b571d245 
								
							 
						 
						
							
							
								
								LoongArch64: Add dgemv_t_8_lasx.S and dgemv_n_8_lasx.S V2  
							
							
							
						 
						
							2023-08-07 11:20:42 +08:00  
				
					
						
							
							
								 
						
							
								71fcee6eef 
								
							 
						 
						
							
							
								
								LoongArch64: Update dgemm kernel  
							
							
							
						 
						
							2023-08-07 11:06:52 +08:00  
				
					
						
							
							
								 
						
							
								0f521ece25 
								
							 
						 
						
							
							
								
								Merge pull request  #4183  from martin-frbg/issue4181  
							
							... 
							
							
							
							Apply USE_TRMM to MIPS64_GENERIC as to GENERIC in gmake builds 
							
						 
						
							2023-08-06 18:59:50 +02:00  
				
					
						
							
							
								 
						
							
								41c31bc1d4 
								
							 
						 
						
							
							
								
								Revert "LoongArch64: Add dgemv_t_8_lasx.S and dgemv_n_8_lasx.S"  
							
							
							
						 
						
							2023-08-06 16:00:03 +02:00  
				
					
						
							
							
								 
						
							
								61d803547a 
								
							 
						 
						
							
							
								
								Apply USE_TRMM to MIPS64_GENERIC as to GENERIC  
							
							
							
						 
						
							2023-08-06 15:17:38 +02:00  
				
					
						
							
							
								 
						
							
								f8ee309402 
								
							 
						 
						
							
							
								
								Merge pull request  #4153  from XiWeiGu/dgemv  
							
							... 
							
							
							
							LoongArch64: Add dgemv_t_8_lasx.S and dgemv_n_8_lasx.S 
							
						 
						
							2023-08-06 08:49:16 +02:00  
				
					
						
							
							
								 
						
							
								ec1e96aac8 
								
							 
						 
						
							
							
								
								LoongArch64: Add dgemv_t_8_lasx.S and dgemv_n_8_lasx.S  
							
							
							
						 
						
							2023-08-05 10:24:17 +08:00  
				
					
						
							
							
								 
						
							
								d46772e037 
								
							 
						 
						
							
							
								
								LoongArch64: Add compiler feature checks  
							
							
							
						 
						
							2023-08-05 10:21:43 +08:00  
				
					
						
							
							
								 
						
							
								4664b57e6e 
								
							 
						 
						
							
							
								
								use shortcut only when both incx and incy are zero  
							
							
							
						 
						
							2023-08-04 12:25:34 +02:00  
				
					
						
							
							
								 
						
							
								09131f79a6 
								
							 
						 
						
							
							
								
								Merge pull request  #4164  from martin-frbg/issue4162  
							
							... 
							
							
							
							Enable use of AVX512 microkernels with NVIDIA HPC from version 22.3 
							
						 
						
							2023-07-29 15:07:20 +02:00  
				
					
						
							
							
								 
						
							
								6a428b5629 
								
							 
						 
						
							
							
								
								Update casum_microk_skylakex-2.c  
							
							
							
						 
						
							2023-07-29 12:24:30 +02:00  
				
					
						
							
							
								 
						
							
								ebb447e32e 
								
							 
						 
						
							
							
								
								Update zasum_microk_skylakex-2.c  
							
							
							
						 
						
							2023-07-29 12:23:57 +02:00  
				
					
						
							
							
								 
						
							
								9f6847583a 
								
							 
						 
						
							
							
								
								nvc currently miscompiles this, hopefully fixed in release 23.09  
							
							
							
						 
						
							2023-07-29 11:50:16 +02:00  
				
					
						
							
							
								 
						
							
								fe54ee3d15 
								
							 
						 
						
							
							
								
								nvc currently miscompiles this, hopefully fixed in release 23.09  
							
							
							
						 
						
							2023-07-29 11:48:38 +02:00  
				
					
						
							
							
								 
						
							
								5720fa02c5 
								
							 
						 
						
							
							
								
								Merge pull request  #4168  from Mousius/sve-zgemm-cgemm  
							
							... 
							
							
							
							Use SVE zgemm/cgemm on Arm(R) Neoverse(TM) V1 core 
							
						 
						
							2023-07-27 17:41:45 +02:00  
				
					
						
							
							
								 
						
							
								84a268b6ca 
								
							 
						 
						
							
							
								
								Use SVE zgemm/cgemm on Arm(R) Neoverse(TM) V1 core  
							
							... 
							
							
							
							This patch removes the prefetches from cgemm/zgemm which improves the performance similar to sgemm/dgemm did in #3868 , this means I'm happy to enable this on any applicable cores.
I also replicated the unrolling the copies from sgemm and dgemm. 
							
						 
						
							2023-07-27 14:12:20 +01:00  
				
					
						
							
							
								 
						
							
								730ca04b48 
								
							 
						 
						
							
							
								
								Fix ZHEMM copy for SVE  
							
							... 
							
							
							
							Whilst disambiguating whilelt, I inadvertantly used the wrong datatype
for offsets, which can be negative. This rectifies that. 
							
						 
						
							2023-07-27 13:27:28 +01:00  
				
					
						
							
							
								 
						
							
								2a62d2df96 
								
							 
						 
						
							
							
								
								Enable use of AVX512 microkernels with NVIDIA HPC from version 22.3  
							
							
							
						 
						
							2023-07-26 19:39:11 +02:00  
				
					
						
							
							
								 
						
							
								849c8806b8 
								
							 
						 
						
							
							
								
								Merge pull request  #4161  from Mousius/non-sve-kernels  
							
							... 
							
							
							
							Use latest non-SVE kernels in ARMV8SVE 
							
						 
						
							2023-07-26 15:49:40 +02:00  
				
					
						
							
							
								 
						
							
								24586bc4ff 
								
							 
						 
						
							
							
								
								Disambiguate whilelt  
							
							
							
						 
						
							2023-07-25 20:15:44 +01:00  
				
					
						
							
							
								 
						
							
								aea2a4622b 
								
							 
						 
						
							
							
								
								Use latest non-SVE kernels in ARMV8SVE  
							
							... 
							
							
							
							These are generally better and, in some cases, include threading which helps in the cores we're targeting here. 
							
						 
						
							2023-07-25 14:12:26 +01:00  
				
					
						
							
							
								 
						
							
								7976deff80 
								
							 
						 
						
							
							
								
								Fix file permissions (issue 4095)  
							
							
							
						 
						
							2023-07-23 20:37:07 +02:00  
				
					
						
							
							
								 
						
							
								76ef1672f8 
								
							 
						 
						
							
							
								
								Override DSDOT with generic code to get rid of qemu precision error  
							
							
							
						 
						
							2023-07-19 22:31:07 +02:00