wernsaar
|
067e8417fd
|
removed unnessesary instructions from zgemm_kernel_2x2_bulldozer.S
|
2013-08-23 22:22:43 +08:00 |
wernsaar
|
a82da3d069
|
removed unnessesary instructions
|
2013-08-23 22:22:27 +08:00 |
Zhang Xianyi
|
1569bf14f8
|
Refs #282. Fixed zgemv_n typo bug on Win64.
|
2013-08-23 16:27:17 +08:00 |
Zhang Xianyi
|
c0159d44a3
|
Merge branch 'develop' of https://github.com/wernsaar/OpenBLAS into wernsaar-develop
|
2013-08-09 10:48:46 +08:00 |
wernsaar
|
c17a850c1c
|
modified KERNEL.BULLDOZER
|
2013-08-08 17:49:30 +02:00 |
wernsaar
|
099853fff6
|
added dtrsm_kernel_RN_8x2_bulldozer.S
|
2013-08-08 07:14:08 +02:00 |
wernsaar
|
44d23881b5
|
dtrsm_kernel_LT_8x2_bulldozer.S performance optimization
|
2013-08-05 11:27:16 +02:00 |
Zhang Xianyi
|
32fb6b9bb2
|
Merge branch 'develop' of https://github.com/wernsaar/OpenBLAS into wernsaar-develop
|
2013-08-05 16:09:47 +08:00 |
wernsaar
|
aaeb8eaecd
|
modified dtrsm_kernel_LT_8x2_bulldozer.S
|
2013-08-04 12:16:12 +02:00 |
wernsaar
|
8aeec32ea0
|
modified dtrsm_kernel_LT_8x2_bulldozer.S
|
2013-08-04 10:15:33 +02:00 |
wernsaar
|
87fc9de572
|
added dtrsm_kernel_LT_8x2_bulldozer.S
|
2013-08-04 09:54:40 +02:00 |
wernsaar
|
564aa60fec
|
removed dtrsm_kernel_LT_8x2_bulldozer.S
|
2013-08-03 15:40:51 +02:00 |
wernsaar
|
f645665dd6
|
fixed bug in dgemv_t_bulldozer.S
|
2013-08-03 12:19:29 +02:00 |
wernsaar
|
e45a347cd2
|
repaired trmm bug in sgemm_kernel_16x2_bulldozer.S
|
2013-08-03 11:43:25 +02:00 |
wernsaar
|
99727ac013
|
repaired trmm bug in cgemm_kernel_4x2_bulldozer.S
|
2013-08-03 10:32:51 +02:00 |
wernsaar
|
6e0a2fbc0c
|
repaired trmm bug in zgemm_kernel_2x2_bulldozer.S
|
2013-08-03 10:17:08 +02:00 |
wernsaar
|
0a22f99c58
|
repaired trmm bug in dgemm_kernel_8x2_bulldozer.S
|
2013-08-03 09:35:39 +02:00 |
wernsaar
|
84bd0aabaa
|
added dtrsm_kernel_LT_8x2_bulldozer.S
|
2013-07-28 16:47:58 +02:00 |
Zhang Xianyi
|
72b1edaf1b
|
Merge branch 'develop' into bulldozer
Conflicts:
kernel/x86_64/KERNEL.BULLDOZER
|
2013-07-28 06:38:25 +02:00 |
wangqian
|
1b3b9e841d
|
Fixed a computational error in zgemm_kernel_4x4_sandy.S file.
|
2013-07-18 20:23:21 +08:00 |
Zhang Xianyi
|
886cbaf4e4
|
Support AMD Piledriver by bulldozer kernels.
|
2013-07-06 12:06:43 -03:00 |
Zhang Xianyi
|
57944538b6
|
Use ALIGN_5 instead of .algin 32 in assembly kernel. Added ALIGN_5 for 32-bit OSX.
|
2013-07-01 16:09:05 +08:00 |
Zhang Xianyi
|
fb298b34ae
|
Merge pull request #235 from wernsaar/develop
Added ddot, daxpy, dcopy kernels for AMD bulldozer.
|
2013-06-21 17:59:26 -07:00 |
wernsaar
|
16012767f4
|
added dcopy_bulldozer.S
|
2013-06-21 16:06:51 +02:00 |
wernsaar
|
bcbac31b47
|
added ddot_bulldozer.S
|
2013-06-20 16:15:09 +02:00 |
wernsaar
|
8dc0c72583
|
added daxpy_bulldozer.S
|
2013-06-20 14:07:54 +02:00 |
wernsaar
|
89405a1a0b
|
cleanup of dgemm_ncopy_8_bulldozer.S
|
2013-06-19 19:31:38 +02:00 |
wernsaar
|
4f2b12b8a8
|
added dgemv_t_bulldozer.S
|
2013-06-19 17:32:42 +02:00 |
Zhang Xianyi
|
646e168d26
|
Merge pull request #233 from wernsaar/develop
added dgemv_n and some faster gemm_copy routines to BULLDOZER.
|
2013-06-18 20:02:36 -07:00 |
wernsaar
|
93dbbe1fb8
|
added dgemm_ncopy_8_bulldozer.S
|
2013-06-18 13:29:23 +02:00 |
wernsaar
|
a135f5d9ed
|
added gemm_tcopy_2_bulldozer.S
|
2013-06-18 11:01:33 +02:00 |
wernsaar
|
d0b6299b13
|
added dgemm_tcopy_8_bulldozer.S
|
2013-06-17 14:19:09 +02:00 |
wernsaar
|
9e58dd509e
|
added gemm_ncopy_2_bulldozer.S
|
2013-06-17 12:55:12 +02:00 |
wernsaar
|
7c8227101b
|
cleanup of dgemv_n_bulldozer.S and optimization of inner loop
|
2013-06-16 12:50:45 +02:00 |
wernsaar
|
f67fa62851
|
added dgemv_n_bulldozer.S
|
2013-06-15 16:42:37 +02:00 |
Zhang Xianyi
|
cd1d473ba0
|
Merge pull request #230 from wernsaar/develop
Refs #230. New dgemm and sgemm Kernel for BULLDOZER
|
2013-06-13 07:29:27 -07:00 |
wernsaar
|
0ded1fcc1c
|
performance optimizations in sgemm_kernel_16x2_bulldozer.S
|
2013-06-13 11:35:15 +02:00 |
wernsaar
|
a789b588cd
|
added cgemm_kernel_4x2_bulldozer.S
|
2013-06-12 15:55:27 +02:00 |
wernsaar
|
8eaa04acbb
|
added zgemm_kernel_2x2_bulldozer.S
|
2013-06-11 12:00:49 +02:00 |
wernsaar
|
d65bbec99b
|
added new sgemm kernel for BULLDOZER
|
2013-06-09 15:57:42 +02:00 |
wernsaar
|
e4c39c7c26
|
changed stack touching
|
2013-06-08 10:43:08 +02:00 |
wernsaar
|
25491e42f9
|
New dgemm kernel for BULLDOZER: dgemm_kernel_8x2_bulldozer.S
|
2013-06-08 09:40:17 +02:00 |
Zhang Xianyi
|
9f59f384d8
|
Refs #223. Fixed s/dgemv bug on windows.
|
2013-06-04 16:01:05 +08:00 |
wangqian
|
23965f164c
|
Fixed overflow internal buffer bug of (s/d/c/z)gemv on x86_64.
|
2013-05-29 19:48:31 +08:00 |
wernsaar
|
69aa6c8fb1
|
bad performance with some data
|
2013-04-28 11:14:23 +02:00 |
wernsaar
|
60b263f3d2
|
removed trsm_kernel_RT_4x4_bulldozer.S. wrong results
|
2013-04-27 17:23:08 +02:00 |
wernsaar
|
7ac306e0da
|
added trsm_kernel_RT_4x4_bulldozer.S
|
2013-04-27 16:48:48 +02:00 |
wernsaar
|
4cb454cdf2
|
added trsm_kernel_LT_4x4_bulldozer.S
|
2013-04-27 14:30:00 +02:00 |
wernsaar
|
19ad2fb128
|
prefetch improved. Defined 2 different kernels for inner loop
|
2013-04-27 13:40:49 +02:00 |
wernsaar
|
6821677489
|
minor improvements and code cleanup
|
2013-04-26 20:05:42 +02:00 |