Commit Graph

53 Commits

Author SHA1 Message Date
wernsaar
5087096711 optimization of sandybridge cgemm-kernel 2014-07-29 19:07:21 +02:00
wernsaar
1cc02b4337 optimized sgemm kernel for haswell 2014-07-28 11:50:01 +02:00
wernsaar
125610d23b allow to set custom value for ?GEMM_DEFAULT_UNROLL_MN, optimizations for syrk 2014-07-24 18:43:31 +02:00
Zhang Xianyi
99efbbbad5 Fixed #395. Enable optimized cgemm for Sandybridge. Added optimized sdot kernel.
Fixed c/zgemm, zgemv computational error of haswell, piledriver, bullldozer, and
barcelona on Windows.

Merge branch 'develop' of https://github.com/wernsaar/OpenBLAS into wernsaar-develop

Conflicts:
	kernel/Makefile.L1
	kernel/x86_64/KERNEL
	param.h
2014-06-29 10:34:51 +08:00
Timothy Gu
6c2ead30f0 Remove all trailing whitespace except lapack-netlib
Signed-off-by: Timothy Gu <timothygu99@gmail.com>
2014-06-27 12:05:18 -07:00
wernsaar
365e8de346 added optimized cgemm-kernel for SANDYBRIDGE 2014-06-27 13:40:29 +02:00
wernsaar
dabab2b5f4 added new optimized sgemm kernel for SANDYBRIGE 2014-06-26 21:42:08 +02:00
wernsaar
aa2709c4e0 enabled optimized dgemm kernel for NEHALEM 2014-06-26 12:22:29 +02:00
wernsaar
d83373db61 added parameter for gemm3m kernels 2014-06-25 10:40:25 +02:00
wernsaar
43fbdb7a5a added ARMV5 as reference platform 2014-05-13 17:25:19 +02:00
wernsaar
5f3b68b4d4 replaced sgemm and cgemm kernels because lapack bugs 2014-05-10 11:24:07 +02:00
wernsaar
2424af62fd replaced dgemm-kernel because bug in lapack 2014-05-10 10:52:37 +02:00
wernsaar
47b22763f8 reduced stack usage on windows to 16K 2014-04-24 14:09:26 +02:00
wernsaar
aae75b2461 modified param.h 2013-12-01 18:43:24 +01:00
wernsaar
b3254eecaf Merge remote branch 'origin/haswell' into develop 2013-12-01 18:09:12 +01:00
wernsaar
ecbc85b954 modified param.h 2013-12-01 17:54:53 +01:00
wernsaar
afe44b0241 tests and code cleanup of gemm_kernels for HASWELL 2013-10-28 14:23:48 +01:00
wernsaar
a77c71eaf5 added highly optimized dgemm_kernel for HASWELL 2013-10-28 10:23:47 +01:00
wernsaar
fe8c5666f9 optimized dgemm_kernel for HASWELL 2013-10-20 16:52:26 +02:00
Zhang Xianyi
2638370844 Init code base for Intel Haswell. 2013-08-13 00:54:59 +08:00
Zhang Xianyi
886cbaf4e4 Support AMD Piledriver by bulldozer kernels. 2013-07-06 12:06:43 -03:00
Zhang Xianyi
6e8501c8a1 Fixed #239 bug in param.h about BARCELONA and BULLDOZER. 2013-06-29 10:36:01 +08:00
wernsaar
f67fa62851 added dgemv_n_bulldozer.S 2013-06-15 16:42:37 +02:00
wernsaar
d65bbec99b added new sgemm kernel for BULLDOZER 2013-06-09 15:57:42 +02:00
wernsaar
ba800f0883 correct GEMM_THREAD in param.h 2013-06-08 10:03:59 +02:00
wernsaar
25491e42f9 New dgemm kernel for BULLDOZER: dgemm_kernel_8x2_bulldozer.S 2013-06-08 09:40:17 +02:00
wernsaar
731220f870 changed DGEMM_DEFAULT_P and DGEMM_DEFAULT_Q to 248 for BULLDOZER 64bit 2013-04-30 10:07:17 +02:00
Zhang Xianyi
b7c0fa6bd2 Init AMD Bulldozer codebase. 2012-12-06 07:29:54 -05:00
Sébastien Villemot
01e3c984ce Fix compilation with TARGET=GENERIC
Patch applied to Debian package
2012-11-14 21:04:05 +01:00
Sylvestre Ledru
3692b4d631 Improve the detection of sparc 2012-07-02 02:51:38 +02:00
Xianyi Zhang
b39c51195b Fixed the build bug about Sandy Bridge on 32-bit.
We used Nehalem/Penryn codes on Sandy Bridge 32-bit.
2012-06-25 14:29:17 +08:00
Xianyi Zhang
996dc6d1c8 Fixed dynamic_arch building bug. 2012-06-19 17:29:06 +08:00
wangqian
f76f952547 Refs #83 #53. Adding Intel Sandy Bridge (AVX supported) kernel codes for BLAS level 3 functions. 2012-06-19 16:37:12 +08:00
Zhang Xianyi
d3b67d0bd8 Refs #113. Fixed the typo BOBCATE -> BOBCAT 2012-05-31 22:40:15 +08:00
Zhang Xianyi
d6cab3f37e Refs #113. Support AMD Bobcate using Barcelona kernel codes. Replace 3DNow! with MMX. 2012-05-31 18:17:45 +08:00
Xianyi Zhang
19a48b82cf Init Sandybridge codes based on Nehalem. 2012-03-30 20:01:03 +08:00
traz
7af0139a09 Modify P Q R size of Loongson3b. 2012-01-11 16:05:39 +00:00
Wang Qian
66904fc4e8 BLAS3 used standard MIPS instructions without extensions on Loongson 3B. 2011-11-25 11:20:25 +00:00
Wang Qian
8163ab7e55 Change the block size on Loongson 3B. 2011-11-23 18:41:49 +00:00
Xianyi Zhang
b95ad4cfaf Support detecting ICT Loongson-3B CPU. 2011-11-09 19:29:50 +00:00
traz
831858b883 Modify aligned address of sa and sb to improve the performance of multi-threads. 2011-09-23 20:59:48 +00:00
traz
d238a768ab Use ps instructions in cgemm. 2011-09-14 15:32:25 +00:00
Xianyi Zhang
4727fe8abf Refs #47. On Loongson 3A, set DGEMM_R parameter depending on different number of threads. It would improve double precision BLAS3 on multi-threads. 2011-09-05 15:13:52 +00:00
traz
74a3f63489 Tuning mb, kb, nb size to get the best performance. 2011-09-01 17:15:28 +00:00
traz
cb0214787b Modify compile options. 2011-08-30 20:57:00 +00:00
traz
c8360e3ae5 Complete all the plura single precision functions of level3 on Loongson3a, the performance is 2.3GFlops. 2011-07-18 17:03:38 +00:00
traz
e72113f06a Add ztrmm and ztrsm part on loongson3a. The average performance is 2.2G. 2011-06-23 21:11:00 +00:00
traz
1c96d345e2 Improve zgemm performance from 1G to 1.8G, change block size in param.h. 2011-06-21 22:16:23 +00:00
traz
88d94d0ec8 Fixed #30 strmm computational error on Loongson3A. 2011-05-28 09:48:34 +00:00
traz
ab9e4ce351 Adjust kc size from 112 to 116 . 2011-04-11 22:17:57 +00:00