Zhang Xianyi
|
886cbaf4e4
|
Support AMD Piledriver by bulldozer kernels.
|
2013-07-06 12:06:43 -03:00 |
Zhang Xianyi
|
6e8501c8a1
|
Fixed #239 bug in param.h about BARCELONA and BULLDOZER.
|
2013-06-29 10:36:01 +08:00 |
wernsaar
|
f67fa62851
|
added dgemv_n_bulldozer.S
|
2013-06-15 16:42:37 +02:00 |
wernsaar
|
d65bbec99b
|
added new sgemm kernel for BULLDOZER
|
2013-06-09 15:57:42 +02:00 |
wernsaar
|
ba800f0883
|
correct GEMM_THREAD in param.h
|
2013-06-08 10:03:59 +02:00 |
wernsaar
|
25491e42f9
|
New dgemm kernel for BULLDOZER: dgemm_kernel_8x2_bulldozer.S
|
2013-06-08 09:40:17 +02:00 |
wernsaar
|
731220f870
|
changed DGEMM_DEFAULT_P and DGEMM_DEFAULT_Q to 248 for BULLDOZER 64bit
|
2013-04-30 10:07:17 +02:00 |
Zhang Xianyi
|
b7c0fa6bd2
|
Init AMD Bulldozer codebase.
|
2012-12-06 07:29:54 -05:00 |
Sébastien Villemot
|
01e3c984ce
|
Fix compilation with TARGET=GENERIC
Patch applied to Debian package
|
2012-11-14 21:04:05 +01:00 |
Sylvestre Ledru
|
3692b4d631
|
Improve the detection of sparc
|
2012-07-02 02:51:38 +02:00 |
Xianyi Zhang
|
b39c51195b
|
Fixed the build bug about Sandy Bridge on 32-bit.
We used Nehalem/Penryn codes on Sandy Bridge 32-bit.
|
2012-06-25 14:29:17 +08:00 |
Xianyi Zhang
|
996dc6d1c8
|
Fixed dynamic_arch building bug.
|
2012-06-19 17:29:06 +08:00 |
wangqian
|
f76f952547
|
Refs #83 #53. Adding Intel Sandy Bridge (AVX supported) kernel codes for BLAS level 3 functions.
|
2012-06-19 16:37:12 +08:00 |
Zhang Xianyi
|
d3b67d0bd8
|
Refs #113. Fixed the typo BOBCATE -> BOBCAT
|
2012-05-31 22:40:15 +08:00 |
Zhang Xianyi
|
d6cab3f37e
|
Refs #113. Support AMD Bobcate using Barcelona kernel codes. Replace 3DNow! with MMX.
|
2012-05-31 18:17:45 +08:00 |
Xianyi Zhang
|
19a48b82cf
|
Init Sandybridge codes based on Nehalem.
|
2012-03-30 20:01:03 +08:00 |
traz
|
7af0139a09
|
Modify P Q R size of Loongson3b.
|
2012-01-11 16:05:39 +00:00 |
Wang Qian
|
66904fc4e8
|
BLAS3 used standard MIPS instructions without extensions on Loongson 3B.
|
2011-11-25 11:20:25 +00:00 |
Wang Qian
|
8163ab7e55
|
Change the block size on Loongson 3B.
|
2011-11-23 18:41:49 +00:00 |
Xianyi Zhang
|
b95ad4cfaf
|
Support detecting ICT Loongson-3B CPU.
|
2011-11-09 19:29:50 +00:00 |
traz
|
831858b883
|
Modify aligned address of sa and sb to improve the performance of multi-threads.
|
2011-09-23 20:59:48 +00:00 |
traz
|
d238a768ab
|
Use ps instructions in cgemm.
|
2011-09-14 15:32:25 +00:00 |
Xianyi Zhang
|
4727fe8abf
|
Refs #47. On Loongson 3A, set DGEMM_R parameter depending on different number of threads. It would improve double precision BLAS3 on multi-threads.
|
2011-09-05 15:13:52 +00:00 |
traz
|
74a3f63489
|
Tuning mb, kb, nb size to get the best performance.
|
2011-09-01 17:15:28 +00:00 |
traz
|
cb0214787b
|
Modify compile options.
|
2011-08-30 20:57:00 +00:00 |
traz
|
c8360e3ae5
|
Complete all the plura single precision functions of level3 on Loongson3a, the performance is 2.3GFlops.
|
2011-07-18 17:03:38 +00:00 |
traz
|
e72113f06a
|
Add ztrmm and ztrsm part on loongson3a. The average performance is 2.2G.
|
2011-06-23 21:11:00 +00:00 |
traz
|
1c96d345e2
|
Improve zgemm performance from 1G to 1.8G, change block size in param.h.
|
2011-06-21 22:16:23 +00:00 |
traz
|
88d94d0ec8
|
Fixed #30 strmm computational error on Loongson3A.
|
2011-05-28 09:48:34 +00:00 |
traz
|
ab9e4ce351
|
Adjust kc size from 112 to 116 .
|
2011-04-11 22:17:57 +00:00 |
traz
|
1aa9a298e1
|
Change BLOCK SIZE of LOONGSON3A TARGET.
|
2011-04-06 10:39:31 +00:00 |
Xianyi Zhang
|
0597c1076f
|
Added the configures of loongson 3a. refs #1
|
2011-01-24 22:45:35 +00:00 |
Xianyi Zhang
|
342bbc3871
|
Import GotoBLAS2 1.13 BSD version codes.
|
2011-01-24 14:54:24 +00:00 |