Zhang Xianyi
|
9f59f384d8
|
Refs #223. Fixed s/dgemv bug on windows.
|
2013-06-04 16:01:05 +08:00 |
wangqian
|
23965f164c
|
Fixed overflow internal buffer bug of (s/d/c/z)gemv on x86_64.
|
2013-05-29 19:48:31 +08:00 |
wangqian
|
6a72840945
|
Fixed overflow internal buffer bug of (s/d/c/z)gemv on x86.
|
2013-05-29 13:23:12 +08:00 |
Zhang Xianyi
|
3326f3152c
|
Merge pull request #213 from wernsaar/develop
Merged some improvements into dgemm_kernel_4x4_bulldozer.S.
|
2013-04-17 23:56:09 -07:00 |
wernsaar
|
7641f6e253
|
Merged some improvements into dgemm_kernel_4x4_bulldozer.S.
Changed the copy functions to generic to solve prefetch conflicts
|
2013-04-16 19:05:06 +02:00 |
Zhang Xianyi
|
3ad29452d1
|
Merge pull request #211 from wernsaar/develop
New version of dgemm_kernel_4x4_bulldozer.S
|
2013-04-15 00:20:55 -07:00 |
wernsaar
|
6e3f6f25a5
|
New version of dgemm_kernel_4x4_bulldozer.S
The peak performance with 8 cores is now 90 GFlops
|
2013-04-12 17:55:51 +02:00 |
Zhang Xianyi
|
724ae159ce
|
Fixed the Windows x86_64 ABI bug in s/daxpy kernels.
|
2013-03-08 22:28:34 +08:00 |
wernsaar
|
f300ce3df5
|
new optimization of dgemm kernel for bulldozer: 10% performance increase
|
2013-03-06 17:26:03 +01:00 |
wernsaar
|
66e64131ed
|
optimized again bulldozer dgemm kernel
|
2013-03-05 19:51:37 +01:00 |
wernsaar
|
9405f26f4b
|
new dgemm_kernel for bulldozer
|
2013-03-04 17:37:38 +01:00 |
Zhang Xianyi
|
5c8bf6ae0e
|
Merge branch 'bulldozer' into develop
|
2013-02-10 01:19:42 +08:00 |
Zhang Xianyi
|
d311236dfd
|
Refs #189. Fixed the bug of s/cdot about invalid reading NAN on x86_64.
|
2013-01-25 20:56:14 +08:00 |
Zhang Xianyi
|
0b08f7479e
|
Refs #154. Fixed gemv_t bug about overflow 16MB buffer on x86.
|
2013-01-20 21:22:12 +08:00 |
Zhang Xianyi
|
99d1978df7
|
Fixed #180. the typos in kernel/x86_64/sgemv_t.S
|
2013-01-12 12:31:14 +08:00 |
Zhang Xianyi
|
08bf6674d5
|
Refs #177. Fixed sgemv_t compiling bug on Win64.
|
2013-01-05 11:36:39 +08:00 |
Zhang Xianyi
|
69200884e1
|
Refs #173. Fixed overflow internal buffer bug of gemv_n on x86
|
2012-12-25 09:27:49 +08:00 |
Zhang Xianyi
|
0d1518add9
|
Refs #173. Fixed overflow internal buffer bug of sgemv_t on x86
|
2012-12-25 09:10:17 +08:00 |
Zhang Xianyi
|
91ed4e4450
|
Refs #171. Prevent loading the dirty number from the buffer in sgemv_t x86 kernel.
|
2012-12-23 23:14:17 +08:00 |
Zhang Xianyi
|
fd3046b32a
|
Refs #173. Fixed overflow internal buffer bug of gemv_t on x86.
|
2012-12-23 21:47:22 +08:00 |
Julian Taylor
|
9fb341a9f8
|
set parameters for CORE_ATHLON
else dgemm_p is set to zero leading to a segfault in alloc_mmap due to
allocsize being zero
|
2012-12-15 16:05:33 +01:00 |
Zhang Xianyi
|
f19af5ecc0
|
Refs #54. Added AMD Bulldozer x86_64 dgemm kernel developed by Werner Saar <wernsaar at googlemail.com>
Based on the dgemm kernel for AMD Barcelona, he used AVX and FMA4 instructions.
Thank Werner Saar!
|
2012-12-07 01:05:11 +08:00 |
Zhang Xianyi
|
bfaaa975e6
|
Added BULLDOZER target. So far it uses barcelona kernels.
|
2012-12-07 00:53:31 +08:00 |
Zhang Xianyi
|
b7c0fa6bd2
|
Init AMD Bulldozer codebase.
|
2012-12-06 07:29:54 -05:00 |
Zhang Xianyi
|
cea1a885b5
|
Refs #154. Fixed the build bug of dgemv_t on MinW64.
|
2012-11-27 07:24:04 +08:00 |
Zhang Xianyi
|
5f0117385e
|
Refs #154. Fixed a SEGFAULT bug of dgemv_t when m is very large.
It overflowed the internal buffer. Thus, we split vector x into blocks when m is very large.
Thank @wangqian for this patch.
|
2012-11-19 22:32:27 +08:00 |
Zhang Xianyi
|
2573311308
|
refs #140. Fixed zdot incompatibility ABI issue with GCC 4.7 on Win 32.
GCC 4.7 uses MSVC ABI on Win 32. This means the caller pops the hidden pointer for returning
aggregate structures larger than 8 bytes.
|
2012-09-24 20:34:33 +08:00 |
Jameson Nash
|
d0e731e8b8
|
provide support for passing CFLAGS, FFLAGS, PFLAGS, FPFLAGS to make on the command line
|
2012-08-21 00:31:12 -04:00 |
Xianyi Zhang
|
25f1a573fd
|
Fixed the build bug when DYNAMIC_ARCH=0.
|
2012-07-07 12:12:24 +08:00 |
wangqian
|
857a0fa0df
|
Fixed the issue of mixing AVX and SSE codes in S/D/C/ZGEMM.
|
2012-06-25 19:00:37 +08:00 |
wangqian
|
d34fce56e4
|
Refs #83 Fixed S/DGEMM calling conventions bug on windows.
|
2012-06-20 19:53:18 +08:00 |
wangqian
|
6cfcb54a28
|
Fixed align problem in S and C precision GEMM kernels.
|
2012-06-20 07:38:39 +08:00 |
wangqian
|
3ef96aa567
|
Fixed bug in MOVQ redefine and ALIGN SIZE problem.
|
2012-06-19 20:37:22 +08:00 |
wangqian
|
f76f952547
|
Refs #83 #53. Adding Intel Sandy Bridge (AVX supported) kernel codes for BLAS level 3 functions.
|
2012-06-19 16:37:12 +08:00 |
Zhang Xianyi
|
eefd30881c
|
Refs #113. Fixed the build bug on AMD Bobcat 64-bit OS.
|
2012-06-02 21:34:23 +08:00 |
Zhang Xianyi
|
d3b67d0bd8
|
Refs #113. Fixed the typo BOBCATE -> BOBCAT
|
2012-05-31 22:40:15 +08:00 |
Zhang Xianyi
|
d6cab3f37e
|
Refs #113. Support AMD Bobcate using Barcelona kernel codes. Replace 3DNow! with MMX.
|
2012-05-31 18:17:45 +08:00 |
Xianyi Zhang
|
a53c6e2440
|
Merge branch 'develop' into sandybridge
|
2012-05-25 23:16:44 +08:00 |
Xianyi Zhang
|
5d657c6e67
|
Fixed #96 a SEGFAULT bug in samax on x86.
|
2012-04-26 16:50:57 +08:00 |
Xianyi Zhang
|
03b0eb19f7
|
Refs #86. Test alpha=Nan in x86/x86_64 dscale.
|
2012-04-05 18:16:18 +08:00 |
Xianyi Zhang
|
19a48b82cf
|
Init Sandybridge codes based on Nehalem.
|
2012-03-30 20:01:03 +08:00 |
Xianyi Zhang
|
3871b6a86d
|
Merge branch 'loongson3b' into release-0.1.0
|
2012-03-23 01:26:44 +08:00 |
Xianyi Zhang
|
83ecfbb9b3
|
Merge branch 'loongson3a' into release-0.1.0
|
2012-03-23 01:26:27 +08:00 |
unknown
|
dff146e306
|
refs #80. Used GEMV SSE2 kernels on x86.
|
2012-03-19 17:56:22 +08:00 |
Wang Qian
|
8e53b57bb2
|
Appending gemmkernel and trmmkernel C code in kernel/generic, this code can be used to execute on a new platform which dose not have optimized assemble kernel.
|
2012-01-10 17:16:13 +00:00 |
Wang Qian
|
66904fc4e8
|
BLAS3 used standard MIPS instructions without extensions on Loongson 3B.
|
2011-11-25 11:20:25 +00:00 |
Xianyi Zhang
|
0884f6b78d
|
Merge branch 'loongson3a' of github.com:xianyi/OpenBLAS into loongson3b
|
2011-11-11 14:26:49 +00:00 |
traz
|
2d78fb05c8
|
Add conjugate condition to gemv.
|
2011-11-10 15:38:48 +00:00 |
Xianyi Zhang
|
b95ad4cfaf
|
Support detecting ICT Loongson-3B CPU.
|
2011-11-09 19:29:50 +00:00 |
Xianyi Zhang
|
3bbe3ddb31
|
Merge branch 'develop' of github.com:xianyi/OpenBLAS into loongson3b
|
2011-11-09 19:08:29 +00:00 |