cfa0a80664 
								
							 
						 
						
							
							
								
								Restore initialization of data variables  
							
							
							
						 
						
							2023-07-13 23:23:12 +02:00  
				
					
						
							
							
								 
						
							
								9567305e4c 
								
							 
						 
						
							
							
								
								Restore initialization of data01,data02  
							
							
							
						 
						
							2023-07-13 23:21:18 +02:00  
				
					
						
							
							
								 
						
							
								cb0a70e0e2 
								
							 
						 
						
							
							
								
								dot.c early bail fix  
							
							
							
						 
						
							2023-03-02 09:51:10 +00:00  
				
					
						
							
							
								 
						
							
								802e71bf05 
								
							 
						 
						
							
							
								
								Add const attribute to lsame  
							
							
							
						 
						
							2022-08-08 15:15:52 +02:00  
				
					
						
							
							
								 
						
							
								ef24712030 
								
							 
						 
						
							
							
								
								Move a conditionally used variable  
							
							
							
						 
						
							2021-09-11 14:37:44 +02:00  
				
					
						
							
							
								 
						
							
								619588fbab 
								
							 
						 
						
							
							
								
								sbgemm: remove unnecessary b0 files  
							
							
							
						 
						
							2021-08-30 17:55:01 +08:00  
				
					
						
							
							
								 
						
							
								1d83ca4bca 
								
							 
						 
						
							
							
								
								Small Matrix: support BFLOAT16 data type  
							
							
							
						 
						
							2021-08-30 17:40:20 +08:00  
				
					
						
							
							
								 
						
							
								989e6bbdd3 
								
							 
						 
						
							
							
								
								Small Matrix: reduce generic kernel source files  
							
							
							
						 
						
							2021-08-13 03:17:38 +00:00  
				
					
						
							
							
								 
						
							
								6b58bca18b 
								
							 
						 
						
							
							
								
								Small Matrix: disable low performance default kernel  
							
							
							
						 
						
							2021-08-03 06:49:03 +00:00  
				
					
						
							
							
								 
						
							
								5dc7c3c8e5 
								
							 
						 
						
							
							
								
								Small Matrix: add GEMM_SMALL_MATRIX_PERMIT to tune small matrics case  
							
							
							
						 
						
							2021-08-02 07:06:54 +00:00  
				
					
						
							
							
								 
						
							
								6022e5629c 
								
							 
						 
						
							
							
								
								Refs  #2587  fix small matrix c/zgemm bug.  
							
							
							
						 
						
							2021-08-02 07:06:54 +00:00  
				
					
						
							
							
								 
						
							
								57ed58cefe 
								
							 
						 
						
							
							
								
								Refs  #2587  Add small matrix optimization reference kernel for c/zgemm.  
							
							
							
						 
						
							2021-08-02 07:06:54 +00:00  
				
					
						
							
							
								 
						
							
								17d32a4a82 
								
							 
						 
						
							
							
								
								Change a1b0 gemm to b0 gemm.  
							
							
							
						 
						
							2021-08-02 07:06:54 +00:00  
				
					
						
							
							
								 
						
							
								be3349405d 
								
							 
						 
						
							
							
								
								Add alpha=1.0 beta=0.0 for small gemm.  
							
							
							
						 
						
							2021-08-02 07:01:47 +00:00  
				
					
						
							
							
								 
						
							
								0a2077901c 
								
							 
						 
						
							
							
								
								Add small marix optimization kernel interface.  
							
							... 
							
							
							
							make SMALL_MATRIX_OPT=1 
							
						 
						
							2021-08-02 07:01:47 +00:00  
				
					
						
							
							
								 
						
							
								ef8e7d0279 
								
							 
						 
						
							
							
								
								Add the support for RISC-V Vector.  
							
							... 
							
							
							
							Change-Id: Iae7800a32f5af3903c330882cdf6f292d885f266 
							
						 
						
							2020-10-15 16:09:02 +08:00  
				
					
						
							
							
								 
						
							
								756062afa5 
								
							 
						 
						
							
							
								
								Rename "HALF" and "sh" to "BFLOAT16" and "sb"  
							
							
							
						 
						
							2020-10-11 23:56:17 +02:00  
				
					
						
							
							
								 
						
							
								60e6c68e38 
								
							 
						 
						
							
							
								
								Adapt ARM architect  
							
							
							
						 
						
							2020-09-29 16:36:14 +08:00  
				
					
						
							
							
								 
						
							
								1b1a757f5f 
								
							 
						 
						
							
							
								
								Optimize the performance of dot by using universal intrinsics in X86/ARM  
							
							
							
						 
						
							2020-09-28 20:36:53 +08:00  
				
					
						
							
							
								 
						
							
								d23419accc 
								
							 
						 
						
							
							
								
								powerpc: Optimized SHGEMM kernel for POWER10  
							
							... 
							
							
							
							This patch introduces new optimized version of SHGEMM kernel
using power10 Matrix-Multiply Assist (MMA) feature introduced in
POWER ISA v3.1. This patch makes use of new POWER10 compute instructions
for matrix multiplication operation.
Tested on simulator and there are no new test failures. 
							
						 
						
							2020-06-25 22:19:08 -05:00  
				
					
						
							
							
								 
						
							
								a87793e03c 
								
							 
						 
						
							
							
								
								Fix DYNAMIC_ARCH compilation errors  
							
							
							
						 
						
							2020-04-15 09:09:50 -05:00  
				
					
						
							
							
								 
						
							
								7eb55504b1 
								
							 
						 
						
							
							
								
								RFC : Add half precision gemm for bfloat16 in OpenBLAS  
							
							... 
							
							
							
							This patch adds support for bfloat16 data type matrix multiplication kernel.
For architectures that don't support bfloat16, it is defined as unsigned short
(2 bytes).  Default unroll sizes can be changed as per architecture as done for
SGEMM and for now 8 and 4 are used for M and N.  Size of ncopy/tcopy can be
changed as per architecture requirement and for now, size 2 is used.
Added shgemm in kernel/power/KERNEL.POWER9 and tested in powerpc64le and
powerpc64.  For reference, added a small test compare_sgemm_shgemm.c to compare
sgemm and shgemm output.
This patch does not cover OpenBLAS test, benchmark and lapack tests for shgemm.
Complex type implementation can be discussed and added once this is approved. 
							
						 
						
							2020-04-14 14:55:08 -05:00  
				
					
						
							
							
								 
						
							
								ff42e68652 
								
							 
						 
						
							
							
								
								Optimize genenal Gemm Beta  
							
							
							
						 
						
							2020-01-20 11:49:42 +08:00  
				
					
						
							
							
								 
						
							
								1e531701b7 
								
							 
						 
						
							
							
								
								fix small typo  
							
							
							
						 
						
							2018-09-09 16:52:25 +02:00  
				
					
						
							
							
								 
						
							
								7a7619af6d 
								
							 
						 
						
							
							
								
								Revert changes from PR#1419  
							
							... 
							
							
							
							at least one of these changes apparently is an oversimplification, leading to TRMM breakage on some platforms as observed in #1563  
							
						 
						
							2018-05-17 11:40:08 +02:00  
				
					
						
							
							
								 
						
							
								e5cc3d72c0 
								
							 
						 
						
							
							
								
								core.IdenticalExpr clang501 checker  
							
							
							
						 
						
							2018-01-19 23:17:43 +01:00  
				
					
						
							
							
								 
						
							
								9fa986337d 
								
							 
						 
						
							
							
								
								add missing brackets to silence indentation warnings gcc721  
							
							
							
						 
						
							2018-01-19 23:11:12 +01:00  
				
					
						
							
							
								 
						
							
								3eed97f6b9 
								
							 
						 
						
							
							
								
								Initialize values to silence cppcheck  
							
							
							
						 
						
							2018-01-12 22:35:00 +01:00  
				
					
						
							
							
								 
						
							
								d602b99386 
								
							 
						 
						
							
							
								
								LAPACK helpers in C that need care too  
							
							
							
						 
						
							2018-01-02 14:38:50 +01:00  
				
					
						
							
							
								 
						
							
								4d0b005e5b 
								
							 
						 
						
							
							
								
								Eliminate remaining unused results in kernels (clang5 analyzer)  
							
							
							
						 
						
							2018-01-01 20:54:39 +01:00  
				
					
						
							
							
								 
						
							
								03e5ff0687 
								
							 
						 
						
							
							
								
								initialize potentially unitialized variables (clang5)  
							
							
							
						 
						
							2017-12-26 09:24:24 +01:00  
				
					
						
							
							
								 
						
							
								47deec2c1a 
								
							 
						 
						
							
							
								
								fix couple of dead assignment warnings  
							
							
							
						 
						
							2017-12-22 00:56:35 +01:00  
				
					
						
							
							
								 
						
							
								281a2b952f 
								
							 
						 
						
							
							
								
								warning cleanup ( #1380 )  
							
							... 
							
							
							
							* dead increments in driver/level2
* dead increments in kernel/generic
* part dead increments in kernel/x86_64 
							
						 
						
							2017-12-05 19:54:10 +01:00  
				
					
						
							
							
								 
						
							
								8213385ab8 
								
							 
						 
						
							
							
								
								Work around compiler warnings for unused variables in the generic zgemm3m_Xcopy kernels  
							
							
							
						 
						
							2017-12-02 22:51:58 +01:00  
				
					
						
							
							
								 
						
							
								441a9c8385 
								
							 
						 
						
							
							
								
								more dead increments clang4 scan-build deadcode.deadstores  
							
							
							
						 
						
							2017-11-26 17:24:08 +01:00  
				
					
						
							
							
								 
						
							
								1236dbe5a6 
								
							 
						 
						
							
							
								
								Eliminate 2-8 dead increments code  
							
							
							
						 
						
							2017-11-26 13:26:11 +01:00  
				
					
						
							
							
								 
						
							
								65bf0a343c 
								
							 
						 
						
							
							
								
								Remove unused variable btpr  
							
							
							
						 
						
							2017-11-14 23:25:50 +01:00  
				
					
						
							
							
								 
						
							
								9d92f526dd 
								
							 
						 
						
							
							
								
								Comment out a code block that performs out-of-bounds memory accesses  
							
							... 
							
							
							
							...and does not appear to be needed even when it stays within the bounds of the array 
							
						 
						
							2017-10-06 23:51:32 +02:00  
				
					
						
							
							
								 
						
							
								f96afd94b0 
								
							 
						 
						
							
							
								
								Fix out-of-bounds accesses where the data should be zero anyway  
							
							
							
						 
						
							2017-10-01 01:06:39 +02:00  
				
					
						
							
							
								 
						
							
								becf8bc7a0 
								
							 
						 
						
							
							
								
								remove dead code  
							
							
							
						 
						
							2016-10-31 12:46:56 +01:00  
				
					
						
							
							
								 
						
							
								594b9f4c73 
								
							 
						 
						
							
							
								
								Do not use vsub to clear the register values since it doesn't work with non-normal numbers.  
							
							
							
						 
						
							2016-01-05 16:54:05 +00:00  
				
					
						
							
							
								 
						
							
								45f78963ac 
								
							 
						 
						
							
							
								
								Optimized cgemm kernel for CORTEXA57  
							
							... 
							
							
							
							Also, add a generic ztrmm 4x4 kernel 
							
						 
						
							2015-11-09 14:15:53 +05:30  
				
					
						
							
							
								 
						
							
								711ca33bc6 
								
							 
						 
						
							
							
								
								Improved Ximatcopy when lda==ldb.  
							
							... 
							
							
							
							The Ximatcopy functions create a copy of the input matrix
although they seem to work inplace. The new routines
XIMATCOPY_K_YY perform the operations inplace if the leading
dimension does not change. 
							
						 
						
							2015-09-07 14:36:16 +02:00  
				
					
						
							
							
								 
						
							
								1cf2b10224 
								
							 
						 
						
							
							
								
								Use pure C generic target on x86 and x86_64.  
							
							... 
							
							
							
							make TARGET=GENERIC
?gemm3m is unimplemented on generic target. 
							
						 
						
							2015-08-03 23:55:56 -05:00  
				
					
						
							
							
								 
						
							
								9bd962f655 
								
							 
						 
						
							
							
								
								modified haswell parameter dgemm_unroll_n  
							
							
							
						 
						
							2015-06-13 10:28:27 +02:00  
				
					
						
							
							
								 
						
							
								ea7f9dacf4 
								
							 
						 
						
							
							
								
								Refs  #509 . Fixed geadd building bug with DYNAMIC_ARCH=1.  
							
							
							
						 
						
							2015-02-26 01:47:11 +08:00  
				
					
						
							
							
								 
						
							
								2fb02626da 
								
							 
						 
						
							
							
								
								Update organization info.  
							
							
							
						 
						
							2014-11-25 15:28:58 +08:00  
				
					
						
							
							
								 
						
							
								58c90d5937 
								
							 
						 
						
							
							
								
								# The first commit's message is:  
							
							... 
							
							
							
							Optimizations for APM's xgene-1 (aarch64).
1) general system updates to support armv8 better.  Make all did not work, one needed to supply TARGET=ARMV8.
2) sgem 4x4 kernel in assembler using SIMD, and configuration changes to use it.
3) strmm 4x4 kernel in C.  Since the sgem kernel does 4x4, the trmm kernel must also do 4xN.
Added Dave Nuechterlein to the contributors list. 
							
						 
						
							2014-11-11 22:19:23 +08:00  
				
					
						
							
							
								 
						
							
								b079df9ef4 
								
							 
						 
						
							
							
								
								added optimized sdot- and dsdot-kernel, written in C  
							
							
							
						 
						
							2014-06-30 14:46:38 +02:00  
				
					
						
							
							
								 
						
							
								6c2ead30f0 
								
							 
						 
						
							
							
								
								Remove all trailing whitespace except lapack-netlib  
							
							... 
							
							
							
							Signed-off-by: Timothy Gu <timothygu99@gmail.com> 
							
						 
						
							2014-06-27 12:05:18 -07:00