wernsaar
|
1c63180bb6
|
updated dgemm_kernel_8x2_vfpv3.S
|
2013-09-30 17:31:23 +02:00 |
wernsaar
|
4a474ea7dc
|
changed dgemm_kernel to use fused multiply add
|
2013-09-29 17:46:23 +02:00 |
wernsaar
|
69ce737cc5
|
modified Makefile.L3 for ARM
|
2013-09-28 19:13:47 +02:00 |
wernsaar
|
70411af888
|
initial checkin of kernel/arm
|
2013-09-28 19:02:25 +02:00 |
wernsaar
|
067e8417fd
|
removed unnessesary instructions from zgemm_kernel_2x2_bulldozer.S
|
2013-08-23 22:22:43 +08:00 |
wernsaar
|
a82da3d069
|
removed unnessesary instructions
|
2013-08-23 22:22:27 +08:00 |
Zhang Xianyi
|
1569bf14f8
|
Refs #282. Fixed zgemv_n typo bug on Win64.
|
2013-08-23 16:27:17 +08:00 |
Zhang Xianyi
|
c0159d44a3
|
Merge branch 'develop' of https://github.com/wernsaar/OpenBLAS into wernsaar-develop
|
2013-08-09 10:48:46 +08:00 |
wernsaar
|
c17a850c1c
|
modified KERNEL.BULLDOZER
|
2013-08-08 17:49:30 +02:00 |
wernsaar
|
099853fff6
|
added dtrsm_kernel_RN_8x2_bulldozer.S
|
2013-08-08 07:14:08 +02:00 |
wernsaar
|
44d23881b5
|
dtrsm_kernel_LT_8x2_bulldozer.S performance optimization
|
2013-08-05 11:27:16 +02:00 |
Zhang Xianyi
|
32fb6b9bb2
|
Merge branch 'develop' of https://github.com/wernsaar/OpenBLAS into wernsaar-develop
|
2013-08-05 16:09:47 +08:00 |
wernsaar
|
aaeb8eaecd
|
modified dtrsm_kernel_LT_8x2_bulldozer.S
|
2013-08-04 12:16:12 +02:00 |
wernsaar
|
8aeec32ea0
|
modified dtrsm_kernel_LT_8x2_bulldozer.S
|
2013-08-04 10:15:33 +02:00 |
wernsaar
|
87fc9de572
|
added dtrsm_kernel_LT_8x2_bulldozer.S
|
2013-08-04 09:54:40 +02:00 |
wernsaar
|
564aa60fec
|
removed dtrsm_kernel_LT_8x2_bulldozer.S
|
2013-08-03 15:40:51 +02:00 |
wernsaar
|
f645665dd6
|
fixed bug in dgemv_t_bulldozer.S
|
2013-08-03 12:19:29 +02:00 |
wernsaar
|
e45a347cd2
|
repaired trmm bug in sgemm_kernel_16x2_bulldozer.S
|
2013-08-03 11:43:25 +02:00 |
wernsaar
|
99727ac013
|
repaired trmm bug in cgemm_kernel_4x2_bulldozer.S
|
2013-08-03 10:32:51 +02:00 |
wernsaar
|
6e0a2fbc0c
|
repaired trmm bug in zgemm_kernel_2x2_bulldozer.S
|
2013-08-03 10:17:08 +02:00 |
wernsaar
|
0a22f99c58
|
repaired trmm bug in dgemm_kernel_8x2_bulldozer.S
|
2013-08-03 09:35:39 +02:00 |
wernsaar
|
cff70a666d
|
added generic trmm kernels and modified Makefile.L3
|
2013-07-30 20:18:57 +02:00 |
wernsaar
|
84bd0aabaa
|
added dtrsm_kernel_LT_8x2_bulldozer.S
|
2013-07-28 16:47:58 +02:00 |
Zhang Xianyi
|
72b1edaf1b
|
Merge branch 'develop' into bulldozer
Conflicts:
kernel/x86_64/KERNEL.BULLDOZER
|
2013-07-28 06:38:25 +02:00 |
wangqian
|
1b3b9e841d
|
Fixed a computational error in zgemm_kernel_4x4_sandy.S file.
|
2013-07-18 20:23:21 +08:00 |
Zhang Xianyi
|
2ed0f6ab60
|
Fixed the typo.
|
2013-07-11 23:47:07 +08:00 |
Zhang Xianyi
|
886cbaf4e4
|
Support AMD Piledriver by bulldozer kernels.
|
2013-07-06 12:06:43 -03:00 |
Zhang Xianyi
|
57944538b6
|
Use ALIGN_5 instead of .algin 32 in assembly kernel. Added ALIGN_5 for 32-bit OSX.
|
2013-07-01 16:09:05 +08:00 |
Zhang Xianyi
|
fa916a0fac
|
Fixed #238 bug in lsame on x86.
|
2013-06-28 22:43:41 +08:00 |
Zhang Xianyi
|
fb298b34ae
|
Merge pull request #235 from wernsaar/develop
Added ddot, daxpy, dcopy kernels for AMD bulldozer.
|
2013-06-21 17:59:26 -07:00 |
wernsaar
|
16012767f4
|
added dcopy_bulldozer.S
|
2013-06-21 16:06:51 +02:00 |
wernsaar
|
bcbac31b47
|
added ddot_bulldozer.S
|
2013-06-20 16:15:09 +02:00 |
wernsaar
|
8dc0c72583
|
added daxpy_bulldozer.S
|
2013-06-20 14:07:54 +02:00 |
wernsaar
|
89405a1a0b
|
cleanup of dgemm_ncopy_8_bulldozer.S
|
2013-06-19 19:31:38 +02:00 |
wernsaar
|
4f2b12b8a8
|
added dgemv_t_bulldozer.S
|
2013-06-19 17:32:42 +02:00 |
Zhang Xianyi
|
646e168d26
|
Merge pull request #233 from wernsaar/develop
added dgemv_n and some faster gemm_copy routines to BULLDOZER.
|
2013-06-18 20:02:36 -07:00 |
wernsaar
|
93dbbe1fb8
|
added dgemm_ncopy_8_bulldozer.S
|
2013-06-18 13:29:23 +02:00 |
wernsaar
|
a135f5d9ed
|
added gemm_tcopy_2_bulldozer.S
|
2013-06-18 11:01:33 +02:00 |
wernsaar
|
d0b6299b13
|
added dgemm_tcopy_8_bulldozer.S
|
2013-06-17 14:19:09 +02:00 |
wernsaar
|
9e58dd509e
|
added gemm_ncopy_2_bulldozer.S
|
2013-06-17 12:55:12 +02:00 |
wernsaar
|
7c8227101b
|
cleanup of dgemv_n_bulldozer.S and optimization of inner loop
|
2013-06-16 12:50:45 +02:00 |
wernsaar
|
f67fa62851
|
added dgemv_n_bulldozer.S
|
2013-06-15 16:42:37 +02:00 |
Zhang Xianyi
|
cd1d473ba0
|
Merge pull request #230 from wernsaar/develop
Refs #230. New dgemm and sgemm Kernel for BULLDOZER
|
2013-06-13 07:29:27 -07:00 |
wernsaar
|
0ded1fcc1c
|
performance optimizations in sgemm_kernel_16x2_bulldozer.S
|
2013-06-13 11:35:15 +02:00 |
wernsaar
|
a789b588cd
|
added cgemm_kernel_4x2_bulldozer.S
|
2013-06-12 15:55:27 +02:00 |
wernsaar
|
8eaa04acbb
|
added zgemm_kernel_2x2_bulldozer.S
|
2013-06-11 12:00:49 +02:00 |
wernsaar
|
d854b30ae6
|
Added UNROLL values for 3M to getarch_2nd.c, Makefile.system and Makefile.L3
|
2013-06-09 17:26:42 +02:00 |
wernsaar
|
d65bbec99b
|
added new sgemm kernel for BULLDOZER
|
2013-06-09 15:57:42 +02:00 |
wernsaar
|
e4c39c7c26
|
changed stack touching
|
2013-06-08 10:43:08 +02:00 |
wernsaar
|
25491e42f9
|
New dgemm kernel for BULLDOZER: dgemm_kernel_8x2_bulldozer.S
|
2013-06-08 09:40:17 +02:00 |
Zhang Xianyi
|
9f59f384d8
|
Refs #223. Fixed s/dgemv bug on windows.
|
2013-06-04 16:01:05 +08:00 |
wangqian
|
23965f164c
|
Fixed overflow internal buffer bug of (s/d/c/z)gemv on x86_64.
|
2013-05-29 19:48:31 +08:00 |
wangqian
|
6a72840945
|
Fixed overflow internal buffer bug of (s/d/c/z)gemv on x86.
|
2013-05-29 13:23:12 +08:00 |
wernsaar
|
69aa6c8fb1
|
bad performance with some data
|
2013-04-28 11:14:23 +02:00 |
wernsaar
|
60b263f3d2
|
removed trsm_kernel_RT_4x4_bulldozer.S. wrong results
|
2013-04-27 17:23:08 +02:00 |
wernsaar
|
7ac306e0da
|
added trsm_kernel_RT_4x4_bulldozer.S
|
2013-04-27 16:48:48 +02:00 |
wernsaar
|
4cb454cdf2
|
added trsm_kernel_LT_4x4_bulldozer.S
|
2013-04-27 14:30:00 +02:00 |
wernsaar
|
19ad2fb128
|
prefetch improved. Defined 2 different kernels for inner loop
|
2013-04-27 13:40:49 +02:00 |
wernsaar
|
6821677489
|
minor improvements and code cleanup
|
2013-04-26 20:05:42 +02:00 |
Zhang Xianyi
|
3326f3152c
|
Merge pull request #213 from wernsaar/develop
Merged some improvements into dgemm_kernel_4x4_bulldozer.S.
|
2013-04-17 23:56:09 -07:00 |
wernsaar
|
7641f6e253
|
Merged some improvements into dgemm_kernel_4x4_bulldozer.S.
Changed the copy functions to generic to solve prefetch conflicts
|
2013-04-16 19:05:06 +02:00 |
Zhang Xianyi
|
3ad29452d1
|
Merge pull request #211 from wernsaar/develop
New version of dgemm_kernel_4x4_bulldozer.S
|
2013-04-15 00:20:55 -07:00 |
wernsaar
|
6e3f6f25a5
|
New version of dgemm_kernel_4x4_bulldozer.S
The peak performance with 8 cores is now 90 GFlops
|
2013-04-12 17:55:51 +02:00 |
Zhang Xianyi
|
724ae159ce
|
Fixed the Windows x86_64 ABI bug in s/daxpy kernels.
|
2013-03-08 22:28:34 +08:00 |
wernsaar
|
f300ce3df5
|
new optimization of dgemm kernel for bulldozer: 10% performance increase
|
2013-03-06 17:26:03 +01:00 |
wernsaar
|
66e64131ed
|
optimized again bulldozer dgemm kernel
|
2013-03-05 19:51:37 +01:00 |
wernsaar
|
9405f26f4b
|
new dgemm_kernel for bulldozer
|
2013-03-04 17:37:38 +01:00 |
Zhang Xianyi
|
5c8bf6ae0e
|
Merge branch 'bulldozer' into develop
|
2013-02-10 01:19:42 +08:00 |
Zhang Xianyi
|
a1ead62f28
|
Disable the warning of sgemm bulldozer kernel.
|
2013-02-09 17:03:13 +01:00 |
Zhang Xianyi
|
0133580148
|
Used sgemm bulldozer kernel on 64 bit.
|
2013-02-09 16:29:14 +01:00 |
Zhang Xianyi
|
274246651d
|
Merge branch 'bulldozer' of git://github.com/wernsaar/OpenBLAS into bulldozer
|
2013-02-09 16:25:07 +01:00 |
Zhang Xianyi
|
299b5a44dc
|
Merge branch 'develop' of github.com:xianyi/OpenBLAS into bulldozer
|
2013-02-09 16:22:04 +01:00 |
Zhang Xianyi
|
d311236dfd
|
Refs #189. Fixed the bug of s/cdot about invalid reading NAN on x86_64.
|
2013-01-25 20:56:14 +08:00 |
Zhang Xianyi
|
0b08f7479e
|
Refs #154. Fixed gemv_t bug about overflow 16MB buffer on x86.
|
2013-01-20 21:22:12 +08:00 |
Zhang Xianyi
|
99d1978df7
|
Fixed #180. the typos in kernel/x86_64/sgemv_t.S
|
2013-01-12 12:31:14 +08:00 |
Zhang Xianyi
|
08bf6674d5
|
Refs #177. Fixed sgemv_t compiling bug on Win64.
|
2013-01-05 11:36:39 +08:00 |
Zhang Xianyi
|
69200884e1
|
Refs #173. Fixed overflow internal buffer bug of gemv_n on x86
|
2012-12-25 09:27:49 +08:00 |
Zhang Xianyi
|
0d1518add9
|
Refs #173. Fixed overflow internal buffer bug of sgemv_t on x86
|
2012-12-25 09:10:17 +08:00 |
Zhang Xianyi
|
91ed4e4450
|
Refs #171. Prevent loading the dirty number from the buffer in sgemv_t x86 kernel.
|
2012-12-23 23:14:17 +08:00 |
Zhang Xianyi
|
fd3046b32a
|
Refs #173. Fixed overflow internal buffer bug of gemv_t on x86.
|
2012-12-23 21:47:22 +08:00 |
Julian Taylor
|
9fb341a9f8
|
set parameters for CORE_ATHLON
else dgemm_p is set to zero leading to a segfault in alloc_mmap due to
allocsize being zero
|
2012-12-15 16:05:33 +01:00 |
wernsaar
|
d48cff8cf1
|
Added optimized sgemm_kernel
|
2012-12-08 18:50:53 +01:00 |
Zhang Xianyi
|
f19af5ecc0
|
Refs #54. Added AMD Bulldozer x86_64 dgemm kernel developed by Werner Saar <wernsaar at googlemail.com>
Based on the dgemm kernel for AMD Barcelona, he used AVX and FMA4 instructions.
Thank Werner Saar!
|
2012-12-07 01:05:11 +08:00 |
Zhang Xianyi
|
bfaaa975e6
|
Added BULLDOZER target. So far it uses barcelona kernels.
|
2012-12-07 00:53:31 +08:00 |
Zhang Xianyi
|
b7c0fa6bd2
|
Init AMD Bulldozer codebase.
|
2012-12-06 07:29:54 -05:00 |
Zhang Xianyi
|
cea1a885b5
|
Refs #154. Fixed the build bug of dgemv_t on MinW64.
|
2012-11-27 07:24:04 +08:00 |
Zhang Xianyi
|
5f0117385e
|
Refs #154. Fixed a SEGFAULT bug of dgemv_t when m is very large.
It overflowed the internal buffer. Thus, we split vector x into blocks when m is very large.
Thank @wangqian for this patch.
|
2012-11-19 22:32:27 +08:00 |
Zhang Xianyi
|
2573311308
|
refs #140. Fixed zdot incompatibility ABI issue with GCC 4.7 on Win 32.
GCC 4.7 uses MSVC ABI on Win 32. This means the caller pops the hidden pointer for returning
aggregate structures larger than 8 bytes.
|
2012-09-24 20:34:33 +08:00 |
Jameson Nash
|
d0e731e8b8
|
provide support for passing CFLAGS, FFLAGS, PFLAGS, FPFLAGS to make on the command line
|
2012-08-21 00:31:12 -04:00 |
Xianyi Zhang
|
25f1a573fd
|
Fixed the build bug when DYNAMIC_ARCH=0.
|
2012-07-07 12:12:24 +08:00 |
wangqian
|
857a0fa0df
|
Fixed the issue of mixing AVX and SSE codes in S/D/C/ZGEMM.
|
2012-06-25 19:00:37 +08:00 |
wangqian
|
d34fce56e4
|
Refs #83 Fixed S/DGEMM calling conventions bug on windows.
|
2012-06-20 19:53:18 +08:00 |
wangqian
|
6cfcb54a28
|
Fixed align problem in S and C precision GEMM kernels.
|
2012-06-20 07:38:39 +08:00 |
wangqian
|
3ef96aa567
|
Fixed bug in MOVQ redefine and ALIGN SIZE problem.
|
2012-06-19 20:37:22 +08:00 |
wangqian
|
f76f952547
|
Refs #83 #53. Adding Intel Sandy Bridge (AVX supported) kernel codes for BLAS level 3 functions.
|
2012-06-19 16:37:12 +08:00 |
Zhang Xianyi
|
eefd30881c
|
Refs #113. Fixed the build bug on AMD Bobcat 64-bit OS.
|
2012-06-02 21:34:23 +08:00 |
Zhang Xianyi
|
d3b67d0bd8
|
Refs #113. Fixed the typo BOBCATE -> BOBCAT
|
2012-05-31 22:40:15 +08:00 |
Zhang Xianyi
|
d6cab3f37e
|
Refs #113. Support AMD Bobcate using Barcelona kernel codes. Replace 3DNow! with MMX.
|
2012-05-31 18:17:45 +08:00 |
Xianyi Zhang
|
a53c6e2440
|
Merge branch 'develop' into sandybridge
|
2012-05-25 23:16:44 +08:00 |
Xianyi Zhang
|
5d657c6e67
|
Fixed #96 a SEGFAULT bug in samax on x86.
|
2012-04-26 16:50:57 +08:00 |