Commit Graph

153 Commits

Author SHA1 Message Date
wangqian 1b3b9e841d Fixed a computational error in zgemm_kernel_4x4_sandy.S file. 2013-07-18 20:23:21 +08:00
Zhang Xianyi 2ed0f6ab60 Fixed the typo. 2013-07-11 23:47:07 +08:00
Zhang Xianyi 886cbaf4e4 Support AMD Piledriver by bulldozer kernels. 2013-07-06 12:06:43 -03:00
Zhang Xianyi 57944538b6 Use ALIGN_5 instead of .algin 32 in assembly kernel. Added ALIGN_5 for 32-bit OSX. 2013-07-01 16:09:05 +08:00
Zhang Xianyi fa916a0fac Fixed #238 bug in lsame on x86. 2013-06-28 22:43:41 +08:00
Zhang Xianyi fb298b34ae Merge pull request #235 from wernsaar/develop
Added ddot, daxpy, dcopy kernels for AMD bulldozer.
2013-06-21 17:59:26 -07:00
wernsaar 16012767f4 added dcopy_bulldozer.S 2013-06-21 16:06:51 +02:00
wernsaar bcbac31b47 added ddot_bulldozer.S 2013-06-20 16:15:09 +02:00
wernsaar 8dc0c72583 added daxpy_bulldozer.S 2013-06-20 14:07:54 +02:00
wernsaar 89405a1a0b cleanup of dgemm_ncopy_8_bulldozer.S 2013-06-19 19:31:38 +02:00
wernsaar 4f2b12b8a8 added dgemv_t_bulldozer.S 2013-06-19 17:32:42 +02:00
Zhang Xianyi 646e168d26 Merge pull request #233 from wernsaar/develop
added dgemv_n and some faster gemm_copy routines to BULLDOZER.
2013-06-18 20:02:36 -07:00
wernsaar 93dbbe1fb8 added dgemm_ncopy_8_bulldozer.S 2013-06-18 13:29:23 +02:00
wernsaar a135f5d9ed added gemm_tcopy_2_bulldozer.S 2013-06-18 11:01:33 +02:00
wernsaar d0b6299b13 added dgemm_tcopy_8_bulldozer.S 2013-06-17 14:19:09 +02:00
wernsaar 9e58dd509e added gemm_ncopy_2_bulldozer.S 2013-06-17 12:55:12 +02:00
wernsaar 7c8227101b cleanup of dgemv_n_bulldozer.S and optimization of inner loop 2013-06-16 12:50:45 +02:00
wernsaar f67fa62851 added dgemv_n_bulldozer.S 2013-06-15 16:42:37 +02:00
Zhang Xianyi cd1d473ba0 Merge pull request #230 from wernsaar/develop
Refs #230. New dgemm and sgemm Kernel for BULLDOZER
2013-06-13 07:29:27 -07:00
wernsaar 0ded1fcc1c performance optimizations in sgemm_kernel_16x2_bulldozer.S 2013-06-13 11:35:15 +02:00
wernsaar a789b588cd added cgemm_kernel_4x2_bulldozer.S 2013-06-12 15:55:27 +02:00
wernsaar 8eaa04acbb added zgemm_kernel_2x2_bulldozer.S 2013-06-11 12:00:49 +02:00
wernsaar d854b30ae6 Added UNROLL values for 3M to getarch_2nd.c, Makefile.system and Makefile.L3 2013-06-09 17:26:42 +02:00
wernsaar d65bbec99b added new sgemm kernel for BULLDOZER 2013-06-09 15:57:42 +02:00
wernsaar e4c39c7c26 changed stack touching 2013-06-08 10:43:08 +02:00
wernsaar 25491e42f9 New dgemm kernel for BULLDOZER: dgemm_kernel_8x2_bulldozer.S 2013-06-08 09:40:17 +02:00
Zhang Xianyi 9f59f384d8 Refs #223. Fixed s/dgemv bug on windows. 2013-06-04 16:01:05 +08:00
wangqian 23965f164c Fixed overflow internal buffer bug of (s/d/c/z)gemv on x86_64. 2013-05-29 19:48:31 +08:00
wangqian 6a72840945 Fixed overflow internal buffer bug of (s/d/c/z)gemv on x86. 2013-05-29 13:23:12 +08:00
wernsaar 69aa6c8fb1 bad performance with some data 2013-04-28 11:14:23 +02:00
wernsaar 60b263f3d2 removed trsm_kernel_RT_4x4_bulldozer.S. wrong results 2013-04-27 17:23:08 +02:00
wernsaar 7ac306e0da added trsm_kernel_RT_4x4_bulldozer.S 2013-04-27 16:48:48 +02:00
wernsaar 4cb454cdf2 added trsm_kernel_LT_4x4_bulldozer.S 2013-04-27 14:30:00 +02:00
wernsaar 19ad2fb128 prefetch improved. Defined 2 different kernels for inner loop 2013-04-27 13:40:49 +02:00
wernsaar 6821677489 minor improvements and code cleanup 2013-04-26 20:05:42 +02:00
Zhang Xianyi 3326f3152c Merge pull request #213 from wernsaar/develop
Merged some improvements into dgemm_kernel_4x4_bulldozer.S.
2013-04-17 23:56:09 -07:00
wernsaar 7641f6e253 Merged some improvements into dgemm_kernel_4x4_bulldozer.S.
Changed the copy functions to generic to solve prefetch conflicts
2013-04-16 19:05:06 +02:00
Zhang Xianyi 3ad29452d1 Merge pull request #211 from wernsaar/develop
New version of dgemm_kernel_4x4_bulldozer.S
2013-04-15 00:20:55 -07:00
wernsaar 6e3f6f25a5 New version of dgemm_kernel_4x4_bulldozer.S
The peak performance with 8 cores is now 90 GFlops
2013-04-12 17:55:51 +02:00
Zhang Xianyi 724ae159ce Fixed the Windows x86_64 ABI bug in s/daxpy kernels. 2013-03-08 22:28:34 +08:00
wernsaar f300ce3df5 new optimization of dgemm kernel for bulldozer: 10% performance increase 2013-03-06 17:26:03 +01:00
wernsaar 66e64131ed optimized again bulldozer dgemm kernel 2013-03-05 19:51:37 +01:00
wernsaar 9405f26f4b new dgemm_kernel for bulldozer 2013-03-04 17:37:38 +01:00
Zhang Xianyi 5c8bf6ae0e Merge branch 'bulldozer' into develop 2013-02-10 01:19:42 +08:00
Zhang Xianyi d311236dfd Refs #189. Fixed the bug of s/cdot about invalid reading NAN on x86_64. 2013-01-25 20:56:14 +08:00
Zhang Xianyi 0b08f7479e Refs #154. Fixed gemv_t bug about overflow 16MB buffer on x86. 2013-01-20 21:22:12 +08:00
Zhang Xianyi 99d1978df7 Fixed #180. the typos in kernel/x86_64/sgemv_t.S 2013-01-12 12:31:14 +08:00
Zhang Xianyi 08bf6674d5 Refs #177. Fixed sgemv_t compiling bug on Win64. 2013-01-05 11:36:39 +08:00
Zhang Xianyi 69200884e1 Refs #173. Fixed overflow internal buffer bug of gemv_n on x86 2012-12-25 09:27:49 +08:00
Zhang Xianyi 0d1518add9 Refs #173. Fixed overflow internal buffer bug of sgemv_t on x86 2012-12-25 09:10:17 +08:00