traz
|
831858b883
|
Modify aligned address of sa and sb to improve the performance of multi-threads.
|
2011-09-23 20:59:48 +00:00 |
traz
|
d238a768ab
|
Use ps instructions in cgemm.
|
2011-09-14 15:32:25 +00:00 |
Xianyi Zhang
|
4727fe8abf
|
Refs #47. On Loongson 3A, set DGEMM_R parameter depending on different number of threads. It would improve double precision BLAS3 on multi-threads.
|
2011-09-05 15:13:52 +00:00 |
traz
|
74a3f63489
|
Tuning mb, kb, nb size to get the best performance.
|
2011-09-01 17:15:28 +00:00 |
traz
|
cb0214787b
|
Modify compile options.
|
2011-08-30 20:57:00 +00:00 |
traz
|
c8360e3ae5
|
Complete all the plura single precision functions of level3 on Loongson3a, the performance is 2.3GFlops.
|
2011-07-18 17:03:38 +00:00 |
traz
|
e72113f06a
|
Add ztrmm and ztrsm part on loongson3a. The average performance is 2.2G.
|
2011-06-23 21:11:00 +00:00 |
traz
|
1c96d345e2
|
Improve zgemm performance from 1G to 1.8G, change block size in param.h.
|
2011-06-21 22:16:23 +00:00 |
traz
|
88d94d0ec8
|
Fixed #30 strmm computational error on Loongson3A.
|
2011-05-28 09:48:34 +00:00 |
traz
|
ab9e4ce351
|
Adjust kc size from 112 to 116 .
|
2011-04-11 22:17:57 +00:00 |
traz
|
1aa9a298e1
|
Change BLOCK SIZE of LOONGSON3A TARGET.
|
2011-04-06 10:39:31 +00:00 |
Xianyi Zhang
|
0597c1076f
|
Added the configures of loongson 3a. refs #1
|
2011-01-24 22:45:35 +00:00 |
Xianyi Zhang
|
342bbc3871
|
Import GotoBLAS2 1.13 BSD version codes.
|
2011-01-24 14:54:24 +00:00 |