bd2498c886 
								
							 
						 
						
							
							
								
								Use POWER6 GEMM parameters on 32bit POWER8  
							
							
							
						 
						
							2020-07-14 18:07:58 +02:00  
				
					
						
							
							
								 
						
							
								d23419accc 
								
							 
						 
						
							
							
								
								powerpc: Optimized SHGEMM kernel for POWER10  
							
							... 
							
							
							
							This patch introduces new optimized version of SHGEMM kernel
using power10 Matrix-Multiply Assist (MMA) feature introduced in
POWER ISA v3.1. This patch makes use of new POWER10 compute instructions
for matrix multiplication operation.
Tested on simulator and there are no new test failures. 
							
						 
						
							2020-06-25 22:19:08 -05:00  
				
					
						
							
							
								 
						
							
								9fe930f205 
								
							 
						 
						
							
							
								
								powerpc: Add support for future processor  
							
							... 
							
							
							
							This is the initial patch to support build infrastructure
for POWER10 architecture. 
							
						 
						
							2020-06-11 15:47:20 -05:00  
				
					
						
							
							
								 
						
							
								f16e39554d 
								
							 
						 
						
							
							
								
								Change PPCG4 CGEMM_M to match kernel change  
							
							
							
						 
						
							2020-06-03 09:15:29 +02:00  
				
					
						
							
							
								 
						
							
								ea5bdc3f72 
								
							 
						 
						
							
							
								
								split cortex-a53 param to match 8x8 kernel  
							
							
							
						 
						
							2020-05-20 22:34:47 +08:00  
				
					
						
							
							
								 
						
							
								1b0b4349a1 
								
							 
						 
						
							
							
								
								s390x/Z14: Change register blocking for SGEMM to 16x4  
							
							... 
							
							
							
							Change register blocking for SGEMM (and STRMM) on z14 from 8x4 to 16x4
by adjusting SGEMM_DEFAULT_UNROLL_M and choosing the appropriate copy
implementations. Actually make KERNEL.Z14 more flexible, so that the
change in param.h suffices. As a result, performance for SGEMM improves
by around 30% on z15.
On z14, FP SIMD instructions can operate on float-sized scalars in
vector registers, while z13 could do that for double-sized scalars only.
Thus, we can double the amount of elements of C that are held in
registers in an SGEMM kernel.
Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com> 
							
						 
						
							2020-05-12 15:59:51 +02:00  
				
					
						
							
							
								 
						
							
								03ff213c51 
								
							 
						 
						
							
							
								
								Increase POWER8 ZGEMM_R and use same R values for POWER9  
							
							... 
							
							
							
							fixes lapack-test zger failures seen in #2299  after application of my PR #2551  
							
						 
						
							2020-04-24 21:46:54 +02:00  
				
					
						
							
							
								 
						
							
								00172d440b 
								
							 
						 
						
							
							
								
								Typo fix in MIPS24K addition  
							
							
							
						 
						
							2020-04-18 21:16:49 +02:00  
				
					
						
							
							
								 
						
							
								61bbae3ac1 
								
							 
						 
						
							
							
								
								Handle  MIPS24K like P5600  
							
							... 
							
							
							
							and allow enforcing TARGET=1004K as well (omission from earlier 1004K merge and later introduction of TARGET check) 
							
						 
						
							2020-04-18 21:09:32 +02:00  
				
					
						
							
							
								 
						
							
								a33d177430 
								
							 
						 
						
							
							
								
								Increase default BUFFER_SIZE on ARM, ZARCH and newer x86_64, add GEMM_R for POWER8/9  
							
							... 
							
							
							
							As shown in #2538 , default buffersizes on some platforms were smaller than required in memory.c
and the requirement could never be fulfilled for a calculated GEMM_R on PPC given the fomula used 
							
						 
						
							2020-04-12 19:44:48 +02:00  
				
					
						
							
							
								 
						
							
								567d2760e6 
								
							 
						 
						
							
							
								
								Merge pull request  #2520  from wjc404/develop  
							
							... 
							
							
							
							Fix avx512 sgemm performance bug when ldc is a multiple of 1024 
							
						 
						
							2020-03-30 20:15:59 +02:00  
				
					
						
							
							
								 
						
							
								64daad4365 
								
							 
						 
						
							
							
								
								Update param.h  
							
							
							
						 
						
							2020-03-20 21:46:18 +00:00  
				
					
						
							
							
								 
						
							
								ea8eec5d17 
								
							 
						 
						
							
							
								
								Merge pull request  #2422  from wjc404/develop  
							
							... 
							
							
							
							Adjust SkylakeX GEMM3M parameters, add an AVX512 STRMM kernel and fix performance bugs in AVX2 s/c/z GEMM 
							
						 
						
							2020-02-29 19:07:35 +01:00  
				
					
						
							
							
								 
						
							
								c623a965f9 
								
							 
						 
						
							
							
								
								Add Neoverse-N1 core  
							
							... 
							
							
							
							The implementation is a hybird of the ARMV8 one with some of the
improved TX2 rountines along with specifying -march=v8.2-a 
							
						 
						
							2020-02-29 03:22:04 +00:00  
				
					
						
							
							
								 
						
							
								8164fd1328 
								
							 
						 
						
							
							
								
								Always assume server-class cpu count for TSV110 and EMAG8180  
							
							
							
						 
						
							2020-02-26 22:19:57 +01:00  
				
					
						
							
							
								 
						
							
								71e5669c3e 
								
							 
						 
						
							
							
								
								Add preliminary support for EMAG8180 ARMV8 processor  
							
							
							
						 
						
							2020-02-19 18:57:26 +01:00  
				
					
						
							
							
								 
						
							
								b0558c11b9 
								
							 
						 
						
							
							
								
								Update param.h  
							
							
							
						 
						
							2020-02-16 23:01:31 +08:00  
				
					
						
							
							
								 
						
							
								83b6be7976 
								
							 
						 
						
							
							
								
								Update param.h  
							
							
							
						 
						
							2020-02-04 19:55:26 +08:00  
				
					
						
							
							
								 
						
							
								f3f969f681 
								
							 
						 
						
							
							
								
								Update param.h  
							
							
							
						 
						
							2020-02-03 21:34:12 +08:00  
				
					
						
							
							
								 
						
							
								fbf4f48f4a 
								
							 
						 
						
							
							
								
								fix a few performance drop in some matrix size per data type  
							
							... 
							
							
							
							Signed-off-by: Wang,Long <long1.wang@intel.com> 
							
						 
						
							2020-01-22 15:15:04 +00:00  
				
					
						
							
							
								 
						
							
								1c67567008 
								
							 
						 
						
							
							
								
								improve skylakex paralleled sgemm performance  
							
							
							
						 
						
							2020-01-13 16:26:03 +08:00  
				
					
						
							
							
								 
						
							
								b7b408a120 
								
							 
						 
						
							
							
								
								optimize AVX2 SGEMM  
							
							
							
						 
						
							2020-01-06 12:16:09 +08:00  
				
					
						
							
							
								 
						
							
								6362c34ee6 
								
							 
						 
						
							
							
								
								Update param.h  
							
							
							
						 
						
							2019-12-30 16:08:19 +08:00  
				
					
						
							
							
								 
						
							
								64639f440f 
								
							 
						 
						
							
							
								
								Update param.h  
							
							
							
						 
						
							2019-12-27 18:06:42 +08:00  
				
					
						
							
							
								 
						
							
								611445c7f8 
								
							 
						 
						
							
							
								
								Update param.h  
							
							
							
						 
						
							2019-12-23 23:44:55 +08:00  
				
					
						
							
							
								 
						
							
								105e26e12a 
								
							 
						 
						
							
							
								
								Adjust Haswell ZGEMM blocking parameters  
							
							
							
						 
						
							2019-12-21 14:38:51 +08:00  
				
					
						
							
							
								 
						
							
								e20709e976 
								
							 
						 
						
							
							
								
								Update param.h  
							
							
							
						 
						
							2019-11-28 19:57:50 +08:00  
				
					
						
							
							
								 
						
							
								6082e556cd 
								
							 
						 
						
							
							
								
								Use "generic" S/CGEMM unroll M on big-endian PPC970  
							
							... 
							
							
							
							as the respective PPC970 "altivec" kernels give wrong results when compiled for big endian 
							
						 
						
							2019-11-17 15:10:26 +01:00  
				
					
						
							
							
								 
						
							
								4c6a457358 
								
							 
						 
						
							
							
								
								Merge pull request  #2300  from wjc404/develop  
							
							... 
							
							
							
							Optimize SGEMM on SKYLAKEX CPUs 
							
						 
						
							2019-11-06 07:27:33 +01:00  
				
					
						
							
							
								 
						
							
								ae43b75a6a 
								
							 
						 
						
							
							
								
								Add files via upload  
							
							
							
						 
						
							2019-11-02 10:09:19 +08:00  
				
					
						
							
							
								 
						
							
								274ff5cdb8 
								
							 
						 
						
							
							
								
								update sgemm_q on skylakex cpus  
							
							
							
						 
						
							2019-11-01 23:59:18 +08:00  
				
					
						
							
							
								 
						
							
								df857551c0 
								
							 
						 
						
							
							
								
								Remove special parameter set for obsolete IOS/ARMV8 workaround  
							
							
							
						 
						
							2019-10-25 23:07:00 +02:00  
				
					
						
							
							
								 
						
							
								5da9484d93 
								
							 
						 
						
							
							
								
								Add files via upload  
							
							
							
						 
						
							2019-10-16 02:01:13 +08:00  
				
					
						
							
							
								 
						
							
								6b83079368 
								
							 
						 
						
							
							
								
								Count cpu cores on ARMV8 and use that to pick the GEMM_PQ parameters ( #2267 )  
							
							... 
							
							
							
							There is currently no simple way to query cache sizes on ARMV8, so this takes the number of cores as a trivial indication if the target is a server-class device with a big cache, or just a single-board toy or smartphone. 
							
						 
						
							2019-09-25 23:13:24 +02:00  
				
					
						
							
							
								 
						
							
								6b6c9b1441 
								
							 
						 
						
							
							
								
								Merge pull request  #2172  from quickwritereader/develop  
							
							... 
							
							
							
							power9 cgemm/ctrmm. new sgemm 8x16 
							
						 
						
							2019-07-01 21:06:02 +02:00  
				
					
						
							
							
								 
						
							
								a97b301aaa 
								
							 
						 
						
							
							
								
								cgemm/ctrmm power9  
							
							
							
						 
						
							2019-07-01 14:07:54 +00:00  
				
					
						
							
							
								 
						
							
								7c7505a778 
								
							 
						 
						
							
							
								
								Fix build for PPC970 on FreeBSD pt.2  
							
							... 
							
							
							
							FreeBSD needs those macros too. 
							
						 
						
							2019-06-28 10:31:45 +00:00  
				
					
						
							
							
								 
						
							
								cdbfb891da 
								
							 
						 
						
							
							
								
								new sgemm 8x16  
							
							
							
						 
						
							2019-06-17 15:33:38 +00:00  
				
					
						
							
							
								 
						
							
								d0c3543c3f 
								
							 
						 
						
							
							
								
								power9 zgemm ztrmm optimized  
							
							
							
						 
						
							2019-06-05 20:07:16 +00:00  
				
					
						
							
							
								 
						
							
								a469b32cf4 
								
							 
						 
						
							
							
								
								sgemm pipeline improved, zgemm rewritten without inner packs, ABI lxvx v20 fixed with vs52  
							
							
							
						 
						
							2019-06-04 07:11:30 +00:00  
				
					
						
							
							
								 
						
							
								8fe794f059 
								
							 
						 
						
							
							
								
								improved zgemm power9 based on power8  
							
							
							
						 
						
							2019-05-30 15:31:25 +00:00  
				
					
						
							
							
								 
						
							
								628b335e83 
								
							 
						 
						
							
							
								
								Merge branch 'develop' of  https://github.com/quickwritereader/OpenBLAS  into develop  
							
							
							
						 
						
							2019-04-29 08:57:44 +00:00  
				
					
						
							
							
								 
						
							
								0f105dd8a5 
								
							 
						 
						
							
							
								
								sgemm/strmm  
							
							
							
						 
						
							2019-04-29 08:49:50 +00:00  
				
					
						
							
							
								 
						
							
								7c51cc8527 
								
							 
						 
						
							
							
								
								Merge branch 'develop' into develop  
							
							
							
						 
						
							2019-03-29 19:36:29 +01:00  
				
					
						
							
							
								 
						
							
								853a18bc17 
								
							 
						 
						
							
							
								
								power9 makefile. dgemm based on power8 kernel with following changes : 32x unrolled 16x4 kernel and 8x4 kernel using (lxv stxv butterfly rank1 update). improvement from 17 to 22-23gflops. dtrmm cases were added into dgemm itself  
							
							
							
						 
						
							2019-03-29 15:49:40 +00:00  
				
					
						
							
							
								 
						
							
								03d7110900 
								
							 
						 
						
							
							
								
								Merge pull request  #2042  from maomao194313/develop  
							
							... 
							
							
							
							add TARGET support for HiSilicon tsv110 CPUs 
							
						 
						
							2019-03-12 22:57:39 +01:00  
				
					
						
							
							
								 
						
							
								7e3eb9b25d 
								
							 
						 
						
							
							
								
								make DYNAMIC_ARCH=1 package work on TSV110  
							
							
							
						 
						
							2019-03-12 16:11:01 +08:00  
				
					
						
							
							
								 
						
							
								b0c714ef60 
								
							 
						 
						
							
							
								
								param.h : enable defines for PPC970 on DarwinOS  
							
							... 
							
							
							
							fixes:
gemm.c: In function 'sgemm_':
../common_param.h:981:18: error: 'SGEMM_DEFAULT_P' undeclared (first use in this function)
 #define SGEMM_P  SGEMM_DEFAULT_P
                  ^ 
							
						 
						
							2019-03-07 12:03:25 -08:00  
				
					
						
							
							
								 
						
							
								bdc73a49e0 
								
							 
						 
						
							
							
								
								Add parameters for Z14  
							
							... 
							
							
							
							from patch provided by aarnez in #991  
							
						 
						
							2019-01-31 21:14:37 +01:00  
				
					
						
							
							
								 
						
							
								bbfdd6c0fe 
								
							 
						 
						
							
							
								
								Increase Zen SWITCH_RATIO to 16  
							
							... 
							
							
							
							following GEMM benchmarks on Ryzen2700X. For #1464  
							
						 
						
							2019-01-19 23:01:31 +01:00