|  wernsaar | d83373db61 | added parameter for gemm3m kernels | 2014-06-25 10:40:25 +02:00 | 
				
					
						|  wernsaar | 43fbdb7a5a | added ARMV5 as reference platform | 2014-05-13 17:25:19 +02:00 | 
				
					
						|  wernsaar | 5f3b68b4d4 | replaced sgemm and cgemm kernels because lapack bugs | 2014-05-10 11:24:07 +02:00 | 
				
					
						|  wernsaar | 2424af62fd | replaced dgemm-kernel because bug in lapack | 2014-05-10 10:52:37 +02:00 | 
				
					
						|  wernsaar | 47b22763f8 | reduced stack usage on windows to 16K | 2014-04-24 14:09:26 +02:00 | 
				
					
						|  wernsaar | aae75b2461 | modified param.h | 2013-12-01 18:43:24 +01:00 | 
				
					
						|  wernsaar | b3254eecaf | Merge remote branch 'origin/haswell' into develop | 2013-12-01 18:09:12 +01:00 | 
				
					
						|  wernsaar | ecbc85b954 | modified param.h | 2013-12-01 17:54:53 +01:00 | 
				
					
						|  wernsaar | afe44b0241 | tests and code cleanup of gemm_kernels for HASWELL | 2013-10-28 14:23:48 +01:00 | 
				
					
						|  wernsaar | a77c71eaf5 | added highly optimized dgemm_kernel for HASWELL | 2013-10-28 10:23:47 +01:00 | 
				
					
						|  wernsaar | fe8c5666f9 | optimized dgemm_kernel for HASWELL | 2013-10-20 16:52:26 +02:00 | 
				
					
						|  Zhang Xianyi | 2638370844 | Init code base for Intel Haswell. | 2013-08-13 00:54:59 +08:00 | 
				
					
						|  Zhang Xianyi | 886cbaf4e4 | Support AMD Piledriver by bulldozer kernels. | 2013-07-06 12:06:43 -03:00 | 
				
					
						|  Zhang Xianyi | 6e8501c8a1 | Fixed #239 bug in param.h about BARCELONA and BULLDOZER. | 2013-06-29 10:36:01 +08:00 | 
				
					
						|  wernsaar | f67fa62851 | added dgemv_n_bulldozer.S | 2013-06-15 16:42:37 +02:00 | 
				
					
						|  wernsaar | d65bbec99b | added new sgemm kernel for BULLDOZER | 2013-06-09 15:57:42 +02:00 | 
				
					
						|  wernsaar | ba800f0883 | correct GEMM_THREAD in param.h | 2013-06-08 10:03:59 +02:00 | 
				
					
						|  wernsaar | 25491e42f9 | New dgemm kernel for BULLDOZER: dgemm_kernel_8x2_bulldozer.S | 2013-06-08 09:40:17 +02:00 | 
				
					
						|  wernsaar | 731220f870 | changed DGEMM_DEFAULT_P and DGEMM_DEFAULT_Q to 248 for BULLDOZER 64bit | 2013-04-30 10:07:17 +02:00 | 
				
					
						|  Zhang Xianyi | b7c0fa6bd2 | Init AMD Bulldozer codebase. | 2012-12-06 07:29:54 -05:00 | 
				
					
						|  Sébastien Villemot | 01e3c984ce | Fix compilation with TARGET=GENERIC Patch applied to Debian package | 2012-11-14 21:04:05 +01:00 | 
				
					
						|  Sylvestre Ledru | 3692b4d631 | Improve the detection of sparc | 2012-07-02 02:51:38 +02:00 | 
				
					
						|  Xianyi Zhang | b39c51195b | Fixed the build bug about Sandy Bridge on 32-bit. We used Nehalem/Penryn codes on Sandy Bridge 32-bit. | 2012-06-25 14:29:17 +08:00 | 
				
					
						|  Xianyi Zhang | 996dc6d1c8 | Fixed dynamic_arch building bug. | 2012-06-19 17:29:06 +08:00 | 
				
					
						|  wangqian | f76f952547 | Refs #83 #53. Adding Intel Sandy Bridge (AVX supported) kernel codes for BLAS level 3 functions. | 2012-06-19 16:37:12 +08:00 | 
				
					
						|  Zhang Xianyi | d3b67d0bd8 | Refs #113. Fixed the typo BOBCATE -> BOBCAT | 2012-05-31 22:40:15 +08:00 | 
				
					
						|  Zhang Xianyi | d6cab3f37e | Refs #113. Support AMD Bobcate using Barcelona kernel codes. Replace 3DNow! with MMX. | 2012-05-31 18:17:45 +08:00 | 
				
					
						|  Xianyi Zhang | 19a48b82cf | Init Sandybridge codes based on Nehalem. | 2012-03-30 20:01:03 +08:00 | 
				
					
						|  traz | 7af0139a09 | Modify P Q R size of Loongson3b. | 2012-01-11 16:05:39 +00:00 | 
				
					
						|  Wang Qian | 66904fc4e8 | BLAS3 used standard MIPS instructions without extensions on Loongson 3B. | 2011-11-25 11:20:25 +00:00 | 
				
					
						|  Wang Qian | 8163ab7e55 | Change the block size on Loongson 3B. | 2011-11-23 18:41:49 +00:00 | 
				
					
						|  Xianyi Zhang | b95ad4cfaf | Support detecting ICT Loongson-3B CPU. | 2011-11-09 19:29:50 +00:00 | 
				
					
						|  traz | 831858b883 | Modify aligned address of sa and sb to improve the performance of multi-threads. | 2011-09-23 20:59:48 +00:00 | 
				
					
						|  traz | d238a768ab | Use ps instructions in cgemm. | 2011-09-14 15:32:25 +00:00 | 
				
					
						|  Xianyi Zhang | 4727fe8abf | Refs #47. On Loongson 3A, set DGEMM_R parameter depending on different number of threads. It would improve double precision BLAS3 on multi-threads. | 2011-09-05 15:13:52 +00:00 | 
				
					
						|  traz | 74a3f63489 | Tuning mb, kb, nb size to get the best performance. | 2011-09-01 17:15:28 +00:00 | 
				
					
						|  traz | cb0214787b | Modify compile options. | 2011-08-30 20:57:00 +00:00 | 
				
					
						|  traz | c8360e3ae5 | Complete all the plura single precision functions of level3 on Loongson3a, the performance is 2.3GFlops. | 2011-07-18 17:03:38 +00:00 | 
				
					
						|  traz | e72113f06a | Add ztrmm and ztrsm part on loongson3a. The average performance is 2.2G. | 2011-06-23 21:11:00 +00:00 | 
				
					
						|  traz | 1c96d345e2 | Improve zgemm performance from 1G to 1.8G, change block size in param.h. | 2011-06-21 22:16:23 +00:00 | 
				
					
						|  traz | 88d94d0ec8 | Fixed #30 strmm computational error on Loongson3A. | 2011-05-28 09:48:34 +00:00 | 
				
					
						|  traz | ab9e4ce351 | Adjust kc size from 112 to 116 . | 2011-04-11 22:17:57 +00:00 | 
				
					
						|  traz | 1aa9a298e1 | Change BLOCK SIZE of LOONGSON3A TARGET. | 2011-04-06 10:39:31 +00:00 | 
				
					
						|  Xianyi Zhang | 0597c1076f | Added the configures of loongson 3a. refs #1 | 2011-01-24 22:45:35 +00:00 | 
				
					
						|  Xianyi Zhang | 342bbc3871 | Import GotoBLAS2 1.13 BSD version codes. | 2011-01-24 14:54:24 +00:00 |