Xianyi Zhang
|
6eb02bbb9c
|
Merge remote branch 'origin/x86' into loongson3a
|
2011-03-02 13:52:05 +08:00 |
Xianyi
|
12214e1d0f
|
Fixed #7. Modified axpy kernel codes to avoid unloop with incx==0 or incy==0 in x86 32bits arch.
|
2011-02-23 20:08:34 +08:00 |
Xianyi Zhang
|
0cfd29a819
|
Fixed #7. 1)Disable the multi-thread and 2) Modified kernel codes to avoid unloop in axpy function when incx==0 or incy==0.
|
2011-02-21 00:24:21 +08:00 |
Xianyi
|
bfaa80c316
|
fixed #4 csrot & drot returned the wrong result when incx==incy==0 on i686 arch.
|
2011-02-18 03:00:58 +08:00 |
Xianyi Zhang
|
c5852d4e30
|
fixed #4 csrot returned the wrong result when incx==incy==0.
|
2011-02-16 23:39:43 +08:00 |
Xianyi Zhang
|
84ba64e65b
|
fixed a bug in drot whe incx or incy equals to zero.
|
2011-02-16 23:35:41 +08:00 |
Xianyi Zhang
|
1e671b49f3
|
Did the experiment with Loongson 3A 128bit load & store instruction.
|
2011-01-29 03:05:27 +08:00 |
Xianyi Zhang
|
77b7020d69
|
changed prefetch order.
|
2011-01-29 03:03:34 +08:00 |
Xianyi Zhang
|
e003b811ab
|
load x & y contiguously in axpy.
|
2011-01-28 11:18:50 +08:00 |
Xianyi Zhang
|
ebe2da8474
|
Modified aligned size. Added additional prefetch instruction because of cache line is 32 bytes in Loongson 3A.
|
2011-01-27 23:07:06 +08:00 |
Xianyi Zhang
|
c0b5992fab
|
added axpy kernel with prefetch for Loongson3A. To-Do: tuning prefetch distance & instruction order.
|
2011-01-26 22:34:33 +08:00 |
Xianyi Zhang
|
342bbc3871
|
Import GotoBLAS2 1.13 BSD version codes.
|
2011-01-24 14:54:24 +00:00 |