5087096711 
								
							 
						 
						
							
							
								
								optimization of sandybridge cgemm-kernel  
							
							
							
						 
						
							2014-07-29 19:07:21 +02:00  
				
					
						
							
							
								 
						
							
								1cc02b4337 
								
							 
						 
						
							
							
								
								optimized sgemm kernel for haswell  
							
							
							
						 
						
							2014-07-28 11:50:01 +02:00  
				
					
						
							
							
								 
						
							
								125610d23b 
								
							 
						 
						
							
							
								
								allow to set custom value for ?GEMM_DEFAULT_UNROLL_MN, optimizations for syrk  
							
							
							
						 
						
							2014-07-24 18:43:31 +02:00  
				
					
						
							
							
								 
						
							
								99efbbbad5 
								
							 
						 
						
							
							
								
								Fixed   #395 . Enable optimized cgemm for Sandybridge. Added optimized sdot kernel.  
							
							... 
							
							
							
							Fixed c/zgemm, zgemv computational error of haswell, piledriver, bullldozer, and
barcelona on Windows.
Merge branch 'develop' of https://github.com/wernsaar/OpenBLAS  into wernsaar-develop
Conflicts:
	kernel/Makefile.L1
	kernel/x86_64/KERNEL
	param.h 
							
						 
						
							2014-06-29 10:34:51 +08:00  
				
					
						
							
							
								 
						
							
								6c2ead30f0 
								
							 
						 
						
							
							
								
								Remove all trailing whitespace except lapack-netlib  
							
							... 
							
							
							
							Signed-off-by: Timothy Gu <timothygu99@gmail.com> 
							
						 
						
							2014-06-27 12:05:18 -07:00  
				
					
						
							
							
								 
						
							
								365e8de346 
								
							 
						 
						
							
							
								
								added optimized cgemm-kernel for SANDYBRIDGE  
							
							
							
						 
						
							2014-06-27 13:40:29 +02:00  
				
					
						
							
							
								 
						
							
								dabab2b5f4 
								
							 
						 
						
							
							
								
								added new optimized sgemm kernel for SANDYBRIGE  
							
							
							
						 
						
							2014-06-26 21:42:08 +02:00  
				
					
						
							
							
								 
						
							
								aa2709c4e0 
								
							 
						 
						
							
							
								
								enabled optimized dgemm kernel for NEHALEM  
							
							
							
						 
						
							2014-06-26 12:22:29 +02:00  
				
					
						
							
							
								 
						
							
								d83373db61 
								
							 
						 
						
							
							
								
								added parameter for gemm3m kernels  
							
							
							
						 
						
							2014-06-25 10:40:25 +02:00  
				
					
						
							
							
								 
						
							
								43fbdb7a5a 
								
							 
						 
						
							
							
								
								added ARMV5 as reference platform  
							
							
							
						 
						
							2014-05-13 17:25:19 +02:00  
				
					
						
							
							
								 
						
							
								5f3b68b4d4 
								
							 
						 
						
							
							
								
								replaced sgemm and cgemm kernels because lapack bugs  
							
							
							
						 
						
							2014-05-10 11:24:07 +02:00  
				
					
						
							
							
								 
						
							
								2424af62fd 
								
							 
						 
						
							
							
								
								replaced dgemm-kernel because bug in lapack  
							
							
							
						 
						
							2014-05-10 10:52:37 +02:00  
				
					
						
							
							
								 
						
							
								47b22763f8 
								
							 
						 
						
							
							
								
								reduced stack usage on windows to 16K  
							
							
							
						 
						
							2014-04-24 14:09:26 +02:00  
				
					
						
							
							
								 
						
							
								aae75b2461 
								
							 
						 
						
							
							
								
								modified param.h  
							
							
							
						 
						
							2013-12-01 18:43:24 +01:00  
				
					
						
							
							
								 
						
							
								b3254eecaf 
								
							 
						 
						
							
							
								
								Merge remote branch 'origin/haswell' into develop  
							
							
							
						 
						
							2013-12-01 18:09:12 +01:00  
				
					
						
							
							
								 
						
							
								ecbc85b954 
								
							 
						 
						
							
							
								
								modified param.h  
							
							
							
						 
						
							2013-12-01 17:54:53 +01:00  
				
					
						
							
							
								 
						
							
								afe44b0241 
								
							 
						 
						
							
							
								
								tests and code cleanup of gemm_kernels for HASWELL  
							
							
							
						 
						
							2013-10-28 14:23:48 +01:00  
				
					
						
							
							
								 
						
							
								a77c71eaf5 
								
							 
						 
						
							
							
								
								added highly optimized dgemm_kernel for HASWELL  
							
							
							
						 
						
							2013-10-28 10:23:47 +01:00  
				
					
						
							
							
								 
						
							
								fe8c5666f9 
								
							 
						 
						
							
							
								
								optimized dgemm_kernel for HASWELL  
							
							
							
						 
						
							2013-10-20 16:52:26 +02:00  
				
					
						
							
							
								 
						
							
								2638370844 
								
							 
						 
						
							
							
								
								Init code base for Intel Haswell.  
							
							
							
						 
						
							2013-08-13 00:54:59 +08:00  
				
					
						
							
							
								 
						
							
								886cbaf4e4 
								
							 
						 
						
							
							
								
								Support AMD Piledriver by bulldozer kernels.  
							
							
							
						 
						
							2013-07-06 12:06:43 -03:00  
				
					
						
							
							
								 
						
							
								6e8501c8a1 
								
							 
						 
						
							
							
								
								Fixed   #239  bug in param.h about BARCELONA and BULLDOZER.  
							
							
							
						 
						
							2013-06-29 10:36:01 +08:00  
				
					
						
							
							
								 
						
							
								f67fa62851 
								
							 
						 
						
							
							
								
								added dgemv_n_bulldozer.S  
							
							
							
						 
						
							2013-06-15 16:42:37 +02:00  
				
					
						
							
							
								 
						
							
								d65bbec99b 
								
							 
						 
						
							
							
								
								added new sgemm kernel for BULLDOZER  
							
							
							
						 
						
							2013-06-09 15:57:42 +02:00  
				
					
						
							
							
								 
						
							
								ba800f0883 
								
							 
						 
						
							
							
								
								correct GEMM_THREAD in param.h  
							
							
							
						 
						
							2013-06-08 10:03:59 +02:00  
				
					
						
							
							
								 
						
							
								25491e42f9 
								
							 
						 
						
							
							
								
								New dgemm kernel for BULLDOZER: dgemm_kernel_8x2_bulldozer.S  
							
							
							
						 
						
							2013-06-08 09:40:17 +02:00  
				
					
						
							
							
								 
						
							
								731220f870 
								
							 
						 
						
							
							
								
								changed DGEMM_DEFAULT_P and DGEMM_DEFAULT_Q to 248 for BULLDOZER 64bit  
							
							
							
						 
						
							2013-04-30 10:07:17 +02:00  
				
					
						
							
							
								 
						
							
								b7c0fa6bd2 
								
							 
						 
						
							
							
								
								Init AMD Bulldozer codebase.  
							
							
							
						 
						
							2012-12-06 07:29:54 -05:00  
				
					
						
							
							
								 
						
							
								01e3c984ce 
								
							 
						 
						
							
							
								
								Fix compilation with TARGET=GENERIC  
							
							... 
							
							
							
							Patch applied to Debian package 
							
						 
						
							2012-11-14 21:04:05 +01:00  
				
					
						
							
							
								 
						
							
								3692b4d631 
								
							 
						 
						
							
							
								
								Improve the detection of sparc  
							
							
							
						 
						
							2012-07-02 02:51:38 +02:00  
				
					
						
							
							
								 
						
							
								b39c51195b 
								
							 
						 
						
							
							
								
								Fixed the build bug about Sandy Bridge on 32-bit.  
							
							... 
							
							
							
							We used Nehalem/Penryn codes on Sandy Bridge 32-bit. 
							
						 
						
							2012-06-25 14:29:17 +08:00  
				
					
						
							
							
								 
						
							
								996dc6d1c8 
								
							 
						 
						
							
							
								
								Fixed dynamic_arch building bug.  
							
							
							
						 
						
							2012-06-19 17:29:06 +08:00  
				
					
						
							
							
								 
						
							
								f76f952547 
								
							 
						 
						
							
							
								
								Refs  #83   #53 . Adding Intel Sandy Bridge (AVX supported) kernel codes for BLAS level 3 functions.  
							
							
							
						 
						
							2012-06-19 16:37:12 +08:00  
				
					
						
							
							
								 
						
							
								d3b67d0bd8 
								
							 
						 
						
							
							
								
								Refs  #113 . Fixed the typo BOBCATE -> BOBCAT  
							
							
							
						 
						
							2012-05-31 22:40:15 +08:00  
				
					
						
							
							
								 
						
							
								d6cab3f37e 
								
							 
						 
						
							
							
								
								Refs  #113 . Support AMD Bobcate using Barcelona kernel codes. Replace 3DNow! with MMX.  
							
							
							
						 
						
							2012-05-31 18:17:45 +08:00  
				
					
						
							
							
								 
						
							
								19a48b82cf 
								
							 
						 
						
							
							
								
								Init Sandybridge codes based on Nehalem.  
							
							
							
						 
						
							2012-03-30 20:01:03 +08:00  
				
					
						
							
							
								 
						
							
								7af0139a09 
								
							 
						 
						
							
							
								
								Modify P Q R size of Loongson3b.  
							
							
							
						 
						
							2012-01-11 16:05:39 +00:00  
				
					
						
							
							
								 
						
							
								66904fc4e8 
								
							 
						 
						
							
							
								
								BLAS3 used standard MIPS instructions without extensions on Loongson 3B.  
							
							
							
						 
						
							2011-11-25 11:20:25 +00:00  
				
					
						
							
							
								 
						
							
								8163ab7e55 
								
							 
						 
						
							
							
								
								Change the block size on Loongson 3B.  
							
							
							
						 
						
							2011-11-23 18:41:49 +00:00  
				
					
						
							
							
								 
						
							
								b95ad4cfaf 
								
							 
						 
						
							
							
								
								Support detecting ICT Loongson-3B CPU.  
							
							
							
						 
						
							2011-11-09 19:29:50 +00:00  
				
					
						
							
							
								 
						
							
								831858b883 
								
							 
						 
						
							
							
								
								Modify aligned address of sa and sb to improve the performance of multi-threads.  
							
							
							
						 
						
							2011-09-23 20:59:48 +00:00  
				
					
						
							
							
								 
						
							
								d238a768ab 
								
							 
						 
						
							
							
								
								Use ps instructions in cgemm.  
							
							
							
						 
						
							2011-09-14 15:32:25 +00:00  
				
					
						
							
							
								 
						
							
								4727fe8abf 
								
							 
						 
						
							
							
								
								Refs  #47 . On Loongson 3A, set DGEMM_R parameter depending on different number of threads. It would improve double precision BLAS3 on multi-threads.  
							
							
							
						 
						
							2011-09-05 15:13:52 +00:00  
				
					
						
							
							
								 
						
							
								74a3f63489 
								
							 
						 
						
							
							
								
								Tuning mb, kb, nb size to get the best performance.  
							
							
							
						 
						
							2011-09-01 17:15:28 +00:00  
				
					
						
							
							
								 
						
							
								cb0214787b 
								
							 
						 
						
							
							
								
								Modify compile options.  
							
							
							
						 
						
							2011-08-30 20:57:00 +00:00  
				
					
						
							
							
								 
						
							
								c8360e3ae5 
								
							 
						 
						
							
							
								
								Complete all the plura single precision functions of level3 on Loongson3a, the performance is 2.3GFlops.  
							
							
							
						 
						
							2011-07-18 17:03:38 +00:00  
				
					
						
							
							
								 
						
							
								e72113f06a 
								
							 
						 
						
							
							
								
								Add ztrmm and ztrsm part on loongson3a. The average performance is 2.2G.  
							
							
							
						 
						
							2011-06-23 21:11:00 +00:00  
				
					
						
							
							
								 
						
							
								1c96d345e2 
								
							 
						 
						
							
							
								
								Improve zgemm performance from 1G to 1.8G, change block size in param.h.  
							
							
							
						 
						
							2011-06-21 22:16:23 +00:00  
				
					
						
							
							
								 
						
							
								88d94d0ec8 
								
							 
						 
						
							
							
								
								Fixed   #30  strmm computational error on Loongson3A.  
							
							
							
						 
						
							2011-05-28 09:48:34 +00:00  
				
					
						
							
							
								 
						
							
								ab9e4ce351 
								
							 
						 
						
							
							
								
								Adjust kc size from 112 to 116 .  
							
							
							
						 
						
							2011-04-11 22:17:57 +00:00