wernsaar
|
bcbac31b47
|
added ddot_bulldozer.S
|
2013-06-20 16:15:09 +02:00 |
wernsaar
|
8dc0c72583
|
added daxpy_bulldozer.S
|
2013-06-20 14:07:54 +02:00 |
wernsaar
|
89405a1a0b
|
cleanup of dgemm_ncopy_8_bulldozer.S
|
2013-06-19 19:31:38 +02:00 |
wernsaar
|
4f2b12b8a8
|
added dgemv_t_bulldozer.S
|
2013-06-19 17:32:42 +02:00 |
wernsaar
|
93dbbe1fb8
|
added dgemm_ncopy_8_bulldozer.S
|
2013-06-18 13:29:23 +02:00 |
wernsaar
|
a135f5d9ed
|
added gemm_tcopy_2_bulldozer.S
|
2013-06-18 11:01:33 +02:00 |
wernsaar
|
d0b6299b13
|
added dgemm_tcopy_8_bulldozer.S
|
2013-06-17 14:19:09 +02:00 |
wernsaar
|
9e58dd509e
|
added gemm_ncopy_2_bulldozer.S
|
2013-06-17 12:55:12 +02:00 |
wernsaar
|
7c8227101b
|
cleanup of dgemv_n_bulldozer.S and optimization of inner loop
|
2013-06-16 12:50:45 +02:00 |
wernsaar
|
f67fa62851
|
added dgemv_n_bulldozer.S
|
2013-06-15 16:42:37 +02:00 |
wernsaar
|
0ded1fcc1c
|
performance optimizations in sgemm_kernel_16x2_bulldozer.S
|
2013-06-13 11:35:15 +02:00 |
wernsaar
|
a789b588cd
|
added cgemm_kernel_4x2_bulldozer.S
|
2013-06-12 15:55:27 +02:00 |
wernsaar
|
8eaa04acbb
|
added zgemm_kernel_2x2_bulldozer.S
|
2013-06-11 12:00:49 +02:00 |
wernsaar
|
d854b30ae6
|
Added UNROLL values for 3M to getarch_2nd.c, Makefile.system and Makefile.L3
|
2013-06-09 17:26:42 +02:00 |
wernsaar
|
d65bbec99b
|
added new sgemm kernel for BULLDOZER
|
2013-06-09 15:57:42 +02:00 |
wernsaar
|
e4c39c7c26
|
changed stack touching
|
2013-06-08 10:43:08 +02:00 |
wernsaar
|
25491e42f9
|
New dgemm kernel for BULLDOZER: dgemm_kernel_8x2_bulldozer.S
|
2013-06-08 09:40:17 +02:00 |
wernsaar
|
69aa6c8fb1
|
bad performance with some data
|
2013-04-28 11:14:23 +02:00 |
wernsaar
|
60b263f3d2
|
removed trsm_kernel_RT_4x4_bulldozer.S. wrong results
|
2013-04-27 17:23:08 +02:00 |
wernsaar
|
7ac306e0da
|
added trsm_kernel_RT_4x4_bulldozer.S
|
2013-04-27 16:48:48 +02:00 |
wernsaar
|
4cb454cdf2
|
added trsm_kernel_LT_4x4_bulldozer.S
|
2013-04-27 14:30:00 +02:00 |
wernsaar
|
19ad2fb128
|
prefetch improved. Defined 2 different kernels for inner loop
|
2013-04-27 13:40:49 +02:00 |
wernsaar
|
6821677489
|
minor improvements and code cleanup
|
2013-04-26 20:05:42 +02:00 |
wernsaar
|
7641f6e253
|
Merged some improvements into dgemm_kernel_4x4_bulldozer.S.
Changed the copy functions to generic to solve prefetch conflicts
|
2013-04-16 19:05:06 +02:00 |
wernsaar
|
6e3f6f25a5
|
New version of dgemm_kernel_4x4_bulldozer.S
The peak performance with 8 cores is now 90 GFlops
|
2013-04-12 17:55:51 +02:00 |
wernsaar
|
f300ce3df5
|
new optimization of dgemm kernel for bulldozer: 10% performance increase
|
2013-03-06 17:26:03 +01:00 |
wernsaar
|
66e64131ed
|
optimized again bulldozer dgemm kernel
|
2013-03-05 19:51:37 +01:00 |
wernsaar
|
9405f26f4b
|
new dgemm_kernel for bulldozer
|
2013-03-04 17:37:38 +01:00 |
Zhang Xianyi
|
5c8bf6ae0e
|
Merge branch 'bulldozer' into develop
|
2013-02-10 01:19:42 +08:00 |
Zhang Xianyi
|
d311236dfd
|
Refs #189. Fixed the bug of s/cdot about invalid reading NAN on x86_64.
|
2013-01-25 20:56:14 +08:00 |
Zhang Xianyi
|
0b08f7479e
|
Refs #154. Fixed gemv_t bug about overflow 16MB buffer on x86.
|
2013-01-20 21:22:12 +08:00 |
Zhang Xianyi
|
99d1978df7
|
Fixed #180. the typos in kernel/x86_64/sgemv_t.S
|
2013-01-12 12:31:14 +08:00 |
Zhang Xianyi
|
08bf6674d5
|
Refs #177. Fixed sgemv_t compiling bug on Win64.
|
2013-01-05 11:36:39 +08:00 |
Zhang Xianyi
|
69200884e1
|
Refs #173. Fixed overflow internal buffer bug of gemv_n on x86
|
2012-12-25 09:27:49 +08:00 |
Zhang Xianyi
|
0d1518add9
|
Refs #173. Fixed overflow internal buffer bug of sgemv_t on x86
|
2012-12-25 09:10:17 +08:00 |
Zhang Xianyi
|
91ed4e4450
|
Refs #171. Prevent loading the dirty number from the buffer in sgemv_t x86 kernel.
|
2012-12-23 23:14:17 +08:00 |
Zhang Xianyi
|
fd3046b32a
|
Refs #173. Fixed overflow internal buffer bug of gemv_t on x86.
|
2012-12-23 21:47:22 +08:00 |
Julian Taylor
|
9fb341a9f8
|
set parameters for CORE_ATHLON
else dgemm_p is set to zero leading to a segfault in alloc_mmap due to
allocsize being zero
|
2012-12-15 16:05:33 +01:00 |
Zhang Xianyi
|
f19af5ecc0
|
Refs #54. Added AMD Bulldozer x86_64 dgemm kernel developed by Werner Saar <wernsaar at googlemail.com>
Based on the dgemm kernel for AMD Barcelona, he used AVX and FMA4 instructions.
Thank Werner Saar!
|
2012-12-07 01:05:11 +08:00 |
Zhang Xianyi
|
bfaaa975e6
|
Added BULLDOZER target. So far it uses barcelona kernels.
|
2012-12-07 00:53:31 +08:00 |
Zhang Xianyi
|
b7c0fa6bd2
|
Init AMD Bulldozer codebase.
|
2012-12-06 07:29:54 -05:00 |
Zhang Xianyi
|
cea1a885b5
|
Refs #154. Fixed the build bug of dgemv_t on MinW64.
|
2012-11-27 07:24:04 +08:00 |
Zhang Xianyi
|
5f0117385e
|
Refs #154. Fixed a SEGFAULT bug of dgemv_t when m is very large.
It overflowed the internal buffer. Thus, we split vector x into blocks when m is very large.
Thank @wangqian for this patch.
|
2012-11-19 22:32:27 +08:00 |
Zhang Xianyi
|
2573311308
|
refs #140. Fixed zdot incompatibility ABI issue with GCC 4.7 on Win 32.
GCC 4.7 uses MSVC ABI on Win 32. This means the caller pops the hidden pointer for returning
aggregate structures larger than 8 bytes.
|
2012-09-24 20:34:33 +08:00 |
Jameson Nash
|
d0e731e8b8
|
provide support for passing CFLAGS, FFLAGS, PFLAGS, FPFLAGS to make on the command line
|
2012-08-21 00:31:12 -04:00 |
Xianyi Zhang
|
25f1a573fd
|
Fixed the build bug when DYNAMIC_ARCH=0.
|
2012-07-07 12:12:24 +08:00 |
wangqian
|
857a0fa0df
|
Fixed the issue of mixing AVX and SSE codes in S/D/C/ZGEMM.
|
2012-06-25 19:00:37 +08:00 |
wangqian
|
d34fce56e4
|
Refs #83 Fixed S/DGEMM calling conventions bug on windows.
|
2012-06-20 19:53:18 +08:00 |
wangqian
|
6cfcb54a28
|
Fixed align problem in S and C precision GEMM kernels.
|
2012-06-20 07:38:39 +08:00 |
wangqian
|
3ef96aa567
|
Fixed bug in MOVQ redefine and ALIGN SIZE problem.
|
2012-06-19 20:37:22 +08:00 |