Zhang Xianyi
886cbaf4e4
Support AMD Piledriver by bulldozer kernels.
2013-07-06 12:06:43 -03:00
Zhang Xianyi
fa916a0fac
Fixed #238 bug in lsame on x86.
2013-06-28 22:43:41 +08:00
wangqian
6a72840945
Fixed overflow internal buffer bug of (s/d/c/z)gemv on x86.
2013-05-29 13:23:12 +08:00
Zhang Xianyi
5c8bf6ae0e
Merge branch 'bulldozer' into develop
2013-02-10 01:19:42 +08:00
Zhang Xianyi
0b08f7479e
Refs #154 . Fixed gemv_t bug about overflow 16MB buffer on x86.
2013-01-20 21:22:12 +08:00
Zhang Xianyi
69200884e1
Refs #173 . Fixed overflow internal buffer bug of gemv_n on x86
2012-12-25 09:27:49 +08:00
Zhang Xianyi
0d1518add9
Refs #173 . Fixed overflow internal buffer bug of sgemv_t on x86
2012-12-25 09:10:17 +08:00
Zhang Xianyi
91ed4e4450
Refs #171 . Prevent loading the dirty number from the buffer in sgemv_t x86 kernel.
2012-12-23 23:14:17 +08:00
Zhang Xianyi
fd3046b32a
Refs #173 . Fixed overflow internal buffer bug of gemv_t on x86.
2012-12-23 21:47:22 +08:00
Zhang Xianyi
bfaaa975e6
Added BULLDOZER target. So far it uses barcelona kernels.
2012-12-07 00:53:31 +08:00
Zhang Xianyi
b7c0fa6bd2
Init AMD Bulldozer codebase.
2012-12-06 07:29:54 -05:00
Zhang Xianyi
2573311308
refs #140 . Fixed zdot incompatibility ABI issue with GCC 4.7 on Win 32.
...
GCC 4.7 uses MSVC ABI on Win 32. This means the caller pops the hidden pointer for returning
aggregate structures larger than 8 bytes.
2012-09-24 20:34:33 +08:00
Zhang Xianyi
d3b67d0bd8
Refs #113 . Fixed the typo BOBCATE -> BOBCAT
2012-05-31 22:40:15 +08:00
Zhang Xianyi
d6cab3f37e
Refs #113 . Support AMD Bobcate using Barcelona kernel codes. Replace 3DNow! with MMX.
2012-05-31 18:17:45 +08:00
Xianyi Zhang
a53c6e2440
Merge branch 'develop' into sandybridge
2012-05-25 23:16:44 +08:00
Xianyi Zhang
5d657c6e67
Fixed #96 a SEGFAULT bug in samax on x86.
2012-04-26 16:50:57 +08:00
Xianyi Zhang
03b0eb19f7
Refs #86 . Test alpha=Nan in x86/x86_64 dscale.
2012-04-05 18:16:18 +08:00
Xianyi Zhang
19a48b82cf
Init Sandybridge codes based on Nehalem.
2012-03-30 20:01:03 +08:00
unknown
dff146e306
refs #80 . Used GEMV SSE2 kernels on x86.
2012-03-19 17:56:22 +08:00
Zhang Xiianyi
7b410b7f0e
Fixed #58 zdot SEGFAULT bug with GCC-4.6. Thank Mr. John for this patch.
...
In i386 calling convention, the caller put the address of return value of zdot into the first hidden parameter.
Thus, the callee should delete this address before return.
Actually, I have fixed the same bug on x86/zdot_sse2.S (issue #32 ). However, that is not a good implementation which uses 3 instructions. Mr. John told me used "ret $0x4" to skip the first hidden address (4 bytes).
2011-09-14 23:52:51 +08:00
traits
b1fe26c45a
refs #55 . Changed DTB_ENTRIES to DTB_DEFAULT_ENTRIES in x86 gemv_n kernel codes.
2011-09-06 14:14:07 +08:00
Xianyi
31040e4d80
Fixed #32 a SEGFAULT bug with gcc-4.6. According to i386 calling convention, The called funtion should remove the hidden return value address from the stack.
2011-06-03 13:19:54 +08:00
Xianyi
272f62a2b6
Changed movlps macro name in capital in x86/zdot_sse2.S file.
2011-03-03 00:46:39 +08:00
Xianyi
36016fe349
On x86 32bits, gcc 4.4.3 generated wrong codes (movsd) from movlps in zdot_sse2.S line 191.
...
This would casue zdotu & zdotc failures. Instead, use movlpd to walk around it. Fixed #8 . Fixed #9 .
2011-03-02 18:45:43 +08:00
Xianyi
12214e1d0f
Fixed #7 . Modified axpy kernel codes to avoid unloop with incx==0 or incy==0 in x86 32bits arch.
2011-02-23 20:08:34 +08:00
Xianyi
bfaa80c316
fixed #4 csrot & drot returned the wrong result when incx==incy==0 on i686 arch.
2011-02-18 03:00:58 +08:00
Xianyi Zhang
342bbc3871
Import GotoBLAS2 1.13 BSD version codes.
2011-01-24 14:54:24 +00:00