traz
|
c1e618ea2d
|
Add complete gemv function on Loongson3a platform.
|
2011-11-03 13:53:48 +00:00 |
traz
|
d238a768ab
|
Use ps instructions in cgemm.
|
2011-09-14 15:32:25 +00:00 |
traz
|
cb0214787b
|
Modify compile options.
|
2011-08-30 20:57:00 +00:00 |
traz
|
c8360e3ae5
|
Complete all the plura single precision functions of level3 on Loongson3a, the performance is 2.3GFlops.
|
2011-07-18 17:03:38 +00:00 |
traz
|
e72113f06a
|
Add ztrmm and ztrsm part on loongson3a. The average performance is 2.2G.
|
2011-06-23 21:11:00 +00:00 |
traz
|
1c96d345e2
|
Improve zgemm performance from 1G to 1.8G, change block size in param.h.
|
2011-06-21 22:16:23 +00:00 |
traz
|
fc84909115
|
Modify single precision compiler conditions, increasing single precision kernel code on Loongson3a.
|
2011-05-27 09:47:17 +00:00 |
traz
|
d2f351d819
|
Modify dtrsm compiler options
|
2011-05-09 17:31:58 +00:00 |
traz
|
782205a693
|
Add dgemm compiler Options in KERNEL.LOONGSON3A.
|
2011-04-06 10:38:34 +00:00 |
Xianyi Zhang
|
1e671b49f3
|
Did the experiment with Loongson 3A 128bit load & store instruction.
|
2011-01-29 03:05:27 +08:00 |
Xianyi Zhang
|
c0b5992fab
|
added axpy kernel with prefetch for Loongson3A. To-Do: tuning prefetch distance & instruction order.
|
2011-01-26 22:34:33 +08:00 |