traz
|
e72113f06a
|
Add ztrmm and ztrsm part on loongson3a. The average performance is 2.2G.
|
2011-06-23 21:11:00 +00:00 |
traz
|
14f81da375
|
Change prefetch length of A and B, the performance is 2.1G now.
|
2011-06-23 10:46:58 +00:00 |
Xianyi Zhang
|
fc21f7ad28
|
Merge branch 'release-v0.1alpha2' into loongson3a
|
2011-06-23 16:08:23 +08:00 |
traz
|
1c96d345e2
|
Improve zgemm performance from 1G to 1.8G, change block size in param.h.
|
2011-06-21 22:16:23 +00:00 |
Xianyi Zhang
|
c4efde7713
|
Merge branch 'loongson3a' into release-v0.1alpha2
|
2011-06-21 17:50:00 +08:00 |
traz
|
88d94d0ec8
|
Fixed #30 strmm computational error on Loongson3A.
|
2011-05-28 09:48:34 +00:00 |
traz
|
fc84909115
|
Modify single precision compiler conditions, increasing single precision kernel code on Loongson3a.
|
2011-05-27 09:47:17 +00:00 |
traz
|
5ca4e51df0
|
Remove the useless code, modify code comments and format.
|
2011-05-18 10:54:51 +00:00 |
Xianyi Zhang
|
fcb5ce011b
|
Fixed #28. Convert the result to double precision in MIPS64 dsdot_k kernel.
|
2011-05-17 21:24:00 +00:00 |
traz
|
a9320f896e
|
Fixed #25 dtrmm and dtrsm computational error on Loongson3A.
|
2011-05-14 22:00:57 +00:00 |
traz
|
29dce62b8f
|
Finish dtrsm_kernel_Rx.S on Loongson3A.
|
2011-05-11 10:44:23 +00:00 |
traz
|
432c309f63
|
Finish dtrsm_kernel_Lx.S on Loongson3A.
|
2011-05-10 12:48:43 +00:00 |
traz
|
d2f351d819
|
Modify dtrsm compiler options
|
2011-05-09 17:31:58 +00:00 |
traz
|
5a991b7149
|
Fixed #24 drmm error on Loongson3A
|
2011-05-09 17:28:20 +00:00 |
traz
|
9320933520
|
Completely dtrmm function.
|
2011-04-17 20:26:49 +00:00 |
traz
|
921caefa56
|
Increased handling trmm part, no edge handling. Test size(M and N) must be a multiple of 4 .
|
2011-04-15 21:56:25 +00:00 |
traz
|
ecd4c1f3d9
|
Modify prefetching C.
|
2011-04-11 22:46:36 +00:00 |
traz
|
ab9e4ce351
|
Adjust kc size from 112 to 116 .
|
2011-04-11 22:17:57 +00:00 |
traz
|
782205a693
|
Add dgemm compiler Options in KERNEL.LOONGSON3A.
|
2011-04-06 10:38:34 +00:00 |
traz
|
ac494c0d04
|
New kernel in LOONGSON3A.
|
2011-04-06 10:36:44 +00:00 |
Xianyi Zhang
|
f405b5bcc5
|
Fixed the bug about Loongson3A gsLQC1 & gsSQC1 instructions in daxpy kernel. Now daxpy is correct.
|
2011-03-18 23:05:56 +00:00 |
Wang Qian
|
d5cffd506a
|
Modified the default kernel makefile in MIPS64 arch.
|
2011-03-07 11:23:12 +00:00 |
Xianyi Zhang
|
5838f12995
|
Support unalign address in daxpy on loongson3a simd..
|
2011-03-05 10:17:10 +08:00 |
Xianyi Zhang
|
5444a3f8f7
|
Unroll to 16 in daxpy on loongson3a.
|
2011-03-04 17:50:17 +08:00 |
Xianyi Zhang
|
1e671b49f3
|
Did the experiment with Loongson 3A 128bit load & store instruction.
|
2011-01-29 03:05:27 +08:00 |
Xianyi Zhang
|
77b7020d69
|
changed prefetch order.
|
2011-01-29 03:03:34 +08:00 |
Xianyi Zhang
|
e003b811ab
|
load x & y contiguously in axpy.
|
2011-01-28 11:18:50 +08:00 |
Xianyi Zhang
|
ebe2da8474
|
Modified aligned size. Added additional prefetch instruction because of cache line is 32 bytes in Loongson 3A.
|
2011-01-27 23:07:06 +08:00 |
Xianyi Zhang
|
c0b5992fab
|
added axpy kernel with prefetch for Loongson3A. To-Do: tuning prefetch distance & instruction order.
|
2011-01-26 22:34:33 +08:00 |
Xianyi Zhang
|
342bbc3871
|
Import GotoBLAS2 1.13 BSD version codes.
|
2011-01-24 14:54:24 +00:00 |