Commit Graph

18 Commits

Author SHA1 Message Date
Zhang Xianyi bfaaa975e6 Added BULLDOZER target. So far it uses barcelona kernels. 2012-12-07 00:53:31 +08:00
Zhang Xianyi b7c0fa6bd2 Init AMD Bulldozer codebase. 2012-12-06 07:29:54 -05:00
Zhang Xianyi 2573311308 refs #140. Fixed zdot incompatibility ABI issue with GCC 4.7 on Win 32.
GCC 4.7 uses MSVC ABI on Win 32. This means the caller pops the hidden pointer for returning
aggregate structures larger than 8 bytes.
2012-09-24 20:34:33 +08:00
Zhang Xianyi d3b67d0bd8 Refs #113. Fixed the typo BOBCATE -> BOBCAT 2012-05-31 22:40:15 +08:00
Zhang Xianyi d6cab3f37e Refs #113. Support AMD Bobcate using Barcelona kernel codes. Replace 3DNow! with MMX. 2012-05-31 18:17:45 +08:00
Xianyi Zhang a53c6e2440 Merge branch 'develop' into sandybridge 2012-05-25 23:16:44 +08:00
Xianyi Zhang 5d657c6e67 Fixed #96 a SEGFAULT bug in samax on x86. 2012-04-26 16:50:57 +08:00
Xianyi Zhang 03b0eb19f7 Refs #86. Test alpha=Nan in x86/x86_64 dscale. 2012-04-05 18:16:18 +08:00
Xianyi Zhang 19a48b82cf Init Sandybridge codes based on Nehalem. 2012-03-30 20:01:03 +08:00
unknown dff146e306 refs #80. Used GEMV SSE2 kernels on x86. 2012-03-19 17:56:22 +08:00
Zhang Xiianyi 7b410b7f0e Fixed #58 zdot SEGFAULT bug with GCC-4.6. Thank Mr. John for this patch.
In i386 calling convention, the caller put the address of return value of zdot into the first hidden parameter.
Thus, the callee should delete this address before return.
Actually, I have fixed the same bug on x86/zdot_sse2.S (issue #32). However, that is not a good implementation which uses 3 instructions. Mr. John told me used "ret $0x4" to skip the first hidden address (4 bytes).
2011-09-14 23:52:51 +08:00
traits b1fe26c45a refs #55. Changed DTB_ENTRIES to DTB_DEFAULT_ENTRIES in x86 gemv_n kernel codes. 2011-09-06 14:14:07 +08:00
Xianyi 31040e4d80 Fixed #32 a SEGFAULT bug with gcc-4.6. According to i386 calling convention, The called funtion should remove the hidden return value address from the stack. 2011-06-03 13:19:54 +08:00
Xianyi 272f62a2b6 Changed movlps macro name in capital in x86/zdot_sse2.S file. 2011-03-03 00:46:39 +08:00
Xianyi 36016fe349 On x86 32bits, gcc 4.4.3 generated wrong codes (movsd) from movlps in zdot_sse2.S line 191.
This would casue zdotu & zdotc failures. Instead, use movlpd to walk around it. Fixed #8. Fixed #9.
2011-03-02 18:45:43 +08:00
Xianyi 12214e1d0f Fixed #7. Modified axpy kernel codes to avoid unloop with incx==0 or incy==0 in x86 32bits arch. 2011-02-23 20:08:34 +08:00
Xianyi bfaa80c316 fixed #4 csrot & drot returned the wrong result when incx==incy==0 on i686 arch. 2011-02-18 03:00:58 +08:00
Xianyi Zhang 342bbc3871 Import GotoBLAS2 1.13 BSD version codes. 2011-01-24 14:54:24 +00:00