Commit Graph

369 Commits

Author SHA1 Message Date
wernsaar
a4dde45f87 optimized sgemv_n kernel for sandybridge 2014-08-05 08:53:09 +02:00
wernsaar
7fa7ea3e1e updated haswell optimized sgmv_n kernel 2014-08-05 08:04:47 +02:00
wernsaar
3fbc13eb65 modified sgemv_n for haswell 2014-08-04 16:22:11 +02:00
wernsaar
db6917303f added a better optimized sgemv_n kernel for bulldozer and piledriver 2014-08-04 14:29:01 +02:00
wernsaar
5087096711 optimization of sandybridge cgemm-kernel 2014-07-29 19:07:21 +02:00
wernsaar
46bc4fd50c optimized cgemm kernel for haswell 2014-07-29 08:53:09 +02:00
wernsaar
1cc02b4337 optimized sgemm kernel for haswell 2014-07-28 11:50:01 +02:00
wernsaar
1d33547222 optimized zgemm kernel for haswell 2014-07-27 11:51:42 +02:00
wernsaar
6acbafe45b added sgemv_n microkernel for haswell 2014-07-20 14:52:25 +02:00
wernsaar
5392d11b04 optimized sgemv_n_microk_sandy.c 2014-07-20 14:08:04 +02:00
wernsaar
c0fe95fb72 added sgemv_n microkernel for sandybridge 2014-07-20 13:17:47 +02:00
wernsaar
d9d4077c93 added sgemv_t microkernel for haswell 2014-07-20 11:30:32 +02:00
wernsaar
02eb72ac42 bugfix in sgemv_t_microk_sandy.c 2014-07-20 10:48:41 +02:00
wernsaar
c06f9986d4 added sgemv_t microkernel for sandybridge 2014-07-20 10:21:08 +02:00
wernsaar
2cce125c79 added optimized sgemv_t for bulldozer and piledriver 2014-07-19 15:48:07 +02:00
wernsaar
b3938fe371 don't use this sgemv_n on Windows 2014-07-19 07:15:34 +02:00
wernsaar
c8a4a56177 performance optimizations for sgemv_n 2014-07-18 11:25:21 +02:00
wernsaar
3c5732615d added blocked sgemv_n and microkernel for bulldozer and piledriver 2014-07-17 23:15:07 +02:00
wernsaar
880597b301 segment violation in sgemv kernels 2014-07-13 10:46:14 +02:00
wernsaar
13348b2137 removed reference to daxpy_bulldozer kernel (Windows bug in lapack-test) 2014-07-06 16:39:32 +02:00
wernsaar
d5b976f92d fallback to zgemm_kernel_4x2_sse.S 2014-07-06 11:05:28 +02:00
wernsaar
e0c080a28c removed reference to zgemm_kernel_4x2_sse3.S (bug in lapack-test) 2014-07-05 16:13:17 +02:00
wernsaar
b079df9ef4 added optimized sdot- and dsdot-kernel, written in C 2014-06-30 14:46:38 +02:00
wernsaar
01a119abfc enabled SMP for sbmv and zsbmv, but only for 64bit binaries 2014-06-29 20:35:56 +02:00
Zhang Xianyi
99efbbbad5 Fixed #395. Enable optimized cgemm for Sandybridge. Added optimized sdot kernel.
Fixed c/zgemm, zgemv computational error of haswell, piledriver, bullldozer, and
barcelona on Windows.

Merge branch 'develop' of https://github.com/wernsaar/OpenBLAS into wernsaar-develop

Conflicts:
	kernel/Makefile.L1
	kernel/x86_64/KERNEL
	param.h
2014-06-29 10:34:51 +08:00
wernsaar
22e5aee2dd fixed zgemv bug for older AMD Processors 2014-06-28 19:04:49 +02:00
wernsaar
35d37e124f bugfix for barcelona zgemv-kernel 2014-06-28 12:36:11 +02:00
wernsaar
d8ba46efdb bugfix for bulldozer cgemm-, zgemm- and zgemv-kernel 2014-06-28 12:16:20 +02:00
wernsaar
a15f22a1f6 bugfix for piledriver cgemm-, zgemm- and zgemv-kernel 2014-06-28 11:46:58 +02:00
wernsaar
b94ea89f52 bugfix for haswell cgemm- and zgemm-kernel 2014-06-28 10:22:40 +02:00
wernsaar
35f668bb14 bugfix for cgemm_kernel_8x2_sandy.S 2014-06-28 10:01:56 +02:00
Timothy Gu
6c2ead30f0 Remove all trailing whitespace except lapack-netlib
Signed-off-by: Timothy Gu <timothygu99@gmail.com>
2014-06-27 12:05:18 -07:00
wernsaar
365e8de346 added optimized cgemm-kernel for SANDYBRIDGE 2014-06-27 13:40:29 +02:00
wernsaar
578d1b6219 added DSDOT definition and enabled optimized sdot kernel 2014-06-27 11:30:29 +02:00
wernsaar
dabab2b5f4 added new optimized sgemm kernel for SANDYBRIGE 2014-06-26 21:42:08 +02:00
wernsaar
aa2709c4e0 enabled optimized dgemm kernel for NEHALEM 2014-06-26 12:22:29 +02:00
wernsaar
a13bcc1716 enabled optimized sgemv kernel for barcelona and piledriver 2014-06-25 13:50:57 +02:00
wernsaar
d2c82d7543 enabled optimized sgemv kernel for HASWELL 2014-06-25 12:56:45 +02:00
wernsaar
0517672dd0 enabled optimized sgemv kernels for nehalem, sandybridge and bulldozer 2014-06-25 12:38:14 +02:00
wernsaar
23203d52c1 Ref #380: lowered stack usage for haswell kernels 2014-06-19 14:31:52 +02:00
wernsaar
73545a79cd Ref #380: lowered stack usage for piledriver and bulldozer kernels 2014-06-19 14:02:14 +02:00
wernsaar
5f3b68b4d4 replaced sgemm and cgemm kernels because lapack bugs 2014-05-10 11:24:07 +02:00
wernsaar
2424af62fd replaced dgemm-kernel because bug in lapack 2014-05-10 10:52:37 +02:00
wernsaar
793509a3b5 replaced files for sdot, sgemv_n and sgemv_t for bug #348 2014-05-06 15:29:39 +02:00
wernsaar
47b22763f8 reduced stack usage on windows to 16K 2014-04-24 14:09:26 +02:00
Zhang Xianyi
9a557e90da Refs #340. Fixed SEGFAULT bug of dgemv_n on OSX. 2014-02-15 23:23:15 +08:00
wangqian
2d557eb1e0 Fixed computational error of dgemv_n. 2014-02-04 21:47:51 +08:00
Zhang Xianyi
05bb391c3a Refs #330. Fixed the compatible issue with clang on Mac OSX. 2013-12-16 20:31:17 +08:00
Zhang Xianyi
9b5be29886 Refs #310. Fixed Segfault bug on nehalem when Julia calling dgeqrt3 on OSX.
Please also check JuliaLang/julia#4099
Julia test script:
  A=rand(256, 256)
  qrfact(A)

I found this was a bug in kernel/x86_64/dgemm_ncopy_8.S.
However, I cannot use gdb with julia. Thus, this is a walkaround fix.
2013-12-12 23:23:04 +08:00
wernsaar
034a5b2083 modified zsymv 2013-12-01 21:07:49 +01:00