wernsaar
13348b2137
removed reference to daxpy_bulldozer kernel (Windows bug in lapack-test)
2014-07-06 16:39:32 +02:00
wernsaar
d5b976f92d
fallback to zgemm_kernel_4x2_sse.S
2014-07-06 11:05:28 +02:00
wernsaar
e0c080a28c
removed reference to zgemm_kernel_4x2_sse3.S (bug in lapack-test)
2014-07-05 16:13:17 +02:00
wernsaar
b079df9ef4
added optimized sdot- and dsdot-kernel, written in C
2014-06-30 14:46:38 +02:00
wernsaar
01a119abfc
enabled SMP for sbmv and zsbmv, but only for 64bit binaries
2014-06-29 20:35:56 +02:00
Zhang Xianyi
99efbbbad5
Fixed #395 . Enable optimized cgemm for Sandybridge. Added optimized sdot kernel.
...
Fixed c/zgemm, zgemv computational error of haswell, piledriver, bullldozer, and
barcelona on Windows.
Merge branch 'develop' of https://github.com/wernsaar/OpenBLAS into wernsaar-develop
Conflicts:
kernel/Makefile.L1
kernel/x86_64/KERNEL
param.h
2014-06-29 10:34:51 +08:00
wernsaar
22e5aee2dd
fixed zgemv bug for older AMD Processors
2014-06-28 19:04:49 +02:00
wernsaar
35d37e124f
bugfix for barcelona zgemv-kernel
2014-06-28 12:36:11 +02:00
wernsaar
d8ba46efdb
bugfix for bulldozer cgemm-, zgemm- and zgemv-kernel
2014-06-28 12:16:20 +02:00
wernsaar
a15f22a1f6
bugfix for piledriver cgemm-, zgemm- and zgemv-kernel
2014-06-28 11:46:58 +02:00
wernsaar
b94ea89f52
bugfix for haswell cgemm- and zgemm-kernel
2014-06-28 10:22:40 +02:00
wernsaar
35f668bb14
bugfix for cgemm_kernel_8x2_sandy.S
2014-06-28 10:01:56 +02:00
Timothy Gu
6c2ead30f0
Remove all trailing whitespace except lapack-netlib
...
Signed-off-by: Timothy Gu <timothygu99@gmail.com>
2014-06-27 12:05:18 -07:00
wernsaar
365e8de346
added optimized cgemm-kernel for SANDYBRIDGE
2014-06-27 13:40:29 +02:00
wernsaar
578d1b6219
added DSDOT definition and enabled optimized sdot kernel
2014-06-27 11:30:29 +02:00
wernsaar
dabab2b5f4
added new optimized sgemm kernel for SANDYBRIGE
2014-06-26 21:42:08 +02:00
wernsaar
aa2709c4e0
enabled optimized dgemm kernel for NEHALEM
2014-06-26 12:22:29 +02:00
wernsaar
a13bcc1716
enabled optimized sgemv kernel for barcelona and piledriver
2014-06-25 13:50:57 +02:00
wernsaar
d2c82d7543
enabled optimized sgemv kernel for HASWELL
2014-06-25 12:56:45 +02:00
wernsaar
0517672dd0
enabled optimized sgemv kernels for nehalem, sandybridge and bulldozer
2014-06-25 12:38:14 +02:00
wernsaar
23203d52c1
Ref #380 : lowered stack usage for haswell kernels
2014-06-19 14:31:52 +02:00
wernsaar
73545a79cd
Ref #380 : lowered stack usage for piledriver and bulldozer kernels
2014-06-19 14:02:14 +02:00
wernsaar
5f3b68b4d4
replaced sgemm and cgemm kernels because lapack bugs
2014-05-10 11:24:07 +02:00
wernsaar
2424af62fd
replaced dgemm-kernel because bug in lapack
2014-05-10 10:52:37 +02:00
wernsaar
793509a3b5
replaced files for sdot, sgemv_n and sgemv_t for bug #348
2014-05-06 15:29:39 +02:00
wernsaar
47b22763f8
reduced stack usage on windows to 16K
2014-04-24 14:09:26 +02:00
Zhang Xianyi
9a557e90da
Refs #340 . Fixed SEGFAULT bug of dgemv_n on OSX.
2014-02-15 23:23:15 +08:00
wangqian
2d557eb1e0
Fixed computational error of dgemv_n.
2014-02-04 21:47:51 +08:00
Zhang Xianyi
05bb391c3a
Refs #330 . Fixed the compatible issue with clang on Mac OSX.
2013-12-16 20:31:17 +08:00
Zhang Xianyi
9b5be29886
Refs #310 . Fixed Segfault bug on nehalem when Julia calling dgeqrt3 on OSX.
...
Please also check JuliaLang/julia#4099
Julia test script:
A=rand(256, 256)
qrfact(A)
I found this was a bug in kernel/x86_64/dgemm_ncopy_8.S.
However, I cannot use gdb with julia. Thus, this is a walkaround fix.
2013-12-12 23:23:04 +08:00
wernsaar
034a5b2083
modified zsymv
2013-12-01 21:07:49 +01:00
wernsaar
27d4234d4d
merged symv
2013-12-01 20:56:02 +01:00
wernsaar
b3254eecaf
Merge remote branch 'origin/haswell' into develop
2013-12-01 18:09:12 +01:00
wernsaar
0b6e13b689
Merge remote branch 'origin/develop' into haswell
2013-12-01 13:38:11 +01:00
wernsaar
e09dc279a2
Merge remote branch 'origin/develop' into piledriver
2013-12-01 13:33:18 +01:00
wernsaar
5c648a8984
Merge remote branch 'origin/develop' into haswell
2013-12-01 11:25:33 +01:00
wernsaar
c44dc4dd3c
Merge remote branch 'origin/develop' into piledriver
2013-12-01 11:06:36 +01:00
wernsaar
f1db386211
changes for compatibility with Pathscale compiler
2013-11-13 17:59:11 +01:00
wernsaar
6da558d2ab
changes for compatibility with Pathscale compiler
2013-11-13 17:39:13 +01:00
Zhang Xianyi
2f5fdd2000
Refs #314 . Fixed clang compiling bug on OSX.
2013-11-07 08:12:03 +08:00
wernsaar
5118a7f4d1
small optimizations on dgemm_kernel for Piledriver
2013-10-31 11:53:26 +01:00
wernsaar
e172b70ea2
added cgemm_kernel for Piledriver
2013-10-31 08:38:17 +01:00
wernsaar
1cf4b974b2
added zgemm_kernel for Piledriver
2013-10-30 09:12:17 +01:00
wernsaar
7bccff1512
added sgemm_kernel for PILEDRIVER
2013-10-29 22:53:04 +01:00
wernsaar
afe44b0241
tests and code cleanup of gemm_kernels for HASWELL
2013-10-28 14:23:48 +01:00
wernsaar
a77c71eaf5
added highly optimized dgemm_kernel for HASWELL
2013-10-28 10:23:47 +01:00
wernsaar
fe8c5666f9
optimized dgemm_kernel for HASWELL
2013-10-20 16:52:26 +02:00
wernsaar
f6b50057e2
corrected and testet FMA3 Code
2013-10-19 10:52:20 +02:00
wernsaar
2840d56aeb
added dgemm_kernel for Piledriver
2013-10-19 09:47:15 +02:00
wangqian
beffee7d91
Fixed buffer overflow bug in kernel/x86_64/dgemv_t.S file.
2013-10-11 03:20:20 +08:00