wernsaar
11eab4c019
added optimized cgemv_n for haswell
2014-08-14 19:00:30 +02:00
wernsaar
4568d32b6b
added optimized cgemv_t kernel for haswell
2014-08-14 14:10:29 +02:00
wernsaar
c1a6374c6f
optimized zgemv_n kernel for sandybridge
2014-08-13 16:10:03 +02:00
wernsaar
2470129132
added fast return, if m or n < 1
2014-08-13 13:54:19 +02:00
wernsaar
8c582d362d
optimized zgemv_t_microk_haswell-2.c
2014-08-13 13:42:22 +02:00
wernsaar
11e34ddd1b
bugfix for zgemv_n_microk_haswell-2.c
2014-08-13 12:54:18 +02:00
wernsaar
9528f0d9ee
bugfix in zgemv_n_microk_sandy-2.c
2014-08-13 12:18:03 +02:00
wernsaar
b06550519e
added optimized cgemv_t c-kernel
2014-08-12 12:15:41 +02:00
wernsaar
6093ee5363
bugfix in zgemv_n_microk_haswell-2.c
2014-08-12 10:02:25 +02:00
wernsaar
07c66b1960
modified algorithm for better numerical stability
2014-08-12 08:35:42 +02:00
wernsaar
58b075daef
added optimized zgemv_t kernel for haswell
2014-08-11 16:57:52 +02:00
wernsaar
09fcd3a341
add optimized zgemv_t kernel for bulldozer
2014-08-11 14:19:25 +02:00
wernsaar
726ad085cb
added optimized zgemv_t for haswell
2014-08-11 13:10:12 +02:00
wernsaar
6fe416976d
added optimimized zgemv_t c-kernel
2014-08-11 09:13:18 +02:00
wernsaar
dbc2eff029
disabled optimized haswell zgemv_n kernel for windows ( bad rounding )
2014-08-10 11:57:24 +02:00
wernsaar
462b4885ff
added optimized zgemv_n kernel for haswell
2014-08-10 08:39:17 +02:00
wernsaar
aa54fe064c
added zgemv_n c-function
2014-08-07 22:30:20 +02:00
wernsaar
006ef3ea01
added optimized dgemv_t kernel for haswell
2014-08-07 10:08:54 +02:00
wernsaar
60f17628cc
added optimized dgemv_n kernel for haswell
2014-08-07 09:18:02 +02:00
wernsaar
c9bad1403a
added optimized sgemv_t kernel for sandybridge
2014-08-07 07:49:33 +02:00
wernsaar
2f8927376f
enabled optimized nehalem sgemv_t kernel for windows
2014-08-06 16:58:21 +02:00
wernsaar
d945a2b06d
added optimized sgemv_t kernel for nehalem
2014-08-06 16:21:48 +02:00
wernsaar
ca6c8d06ce
enabled optimized sgemv kernels for windows
2014-08-06 14:24:36 +02:00
wernsaar
7aa43c8928
enabled optimized sgemv kernels for windows
2014-08-06 14:06:30 +02:00
wernsaar
891b960854
added optimized sgemv_t kernel for haswell
2014-08-06 13:42:41 +02:00
wernsaar
95a8caa2f3
added optimized sgemv_t kernel
2014-08-06 12:12:17 +02:00
wernsaar
8c05b8105b
bugfix in sgemv_n.c
2014-08-05 20:14:29 +02:00
wernsaar
c80084a98f
changed default x86_64 sgemv_n kernel to sgemv_n.c
2014-08-05 19:42:56 +02:00
wernsaar
2bab92961f
enabled optimized sgemv_n kernels for windows
2014-08-05 14:52:54 +02:00
wernsaar
9175b8bd5f
changed long to blaslong for windows compatibility
2014-08-05 13:28:39 +02:00
wernsaar
793f2d43b0
added optimized sgemv_n kernel for nehalem
2014-08-05 10:50:08 +02:00
wernsaar
a4dde45f87
optimized sgemv_n kernel for sandybridge
2014-08-05 08:53:09 +02:00
wernsaar
7fa7ea3e1e
updated haswell optimized sgmv_n kernel
2014-08-05 08:04:47 +02:00
wernsaar
3fbc13eb65
modified sgemv_n for haswell
2014-08-04 16:22:11 +02:00
wernsaar
db6917303f
added a better optimized sgemv_n kernel for bulldozer and piledriver
2014-08-04 14:29:01 +02:00
wernsaar
5087096711
optimization of sandybridge cgemm-kernel
2014-07-29 19:07:21 +02:00
wernsaar
46bc4fd50c
optimized cgemm kernel for haswell
2014-07-29 08:53:09 +02:00
wernsaar
1cc02b4337
optimized sgemm kernel for haswell
2014-07-28 11:50:01 +02:00
wernsaar
1d33547222
optimized zgemm kernel for haswell
2014-07-27 11:51:42 +02:00
wernsaar
125610d23b
allow to set custom value for ?GEMM_DEFAULT_UNROLL_MN, optimizations for syrk
2014-07-24 18:43:31 +02:00
wernsaar
6acbafe45b
added sgemv_n microkernel for haswell
2014-07-20 14:52:25 +02:00
wernsaar
5392d11b04
optimized sgemv_n_microk_sandy.c
2014-07-20 14:08:04 +02:00
wernsaar
c0fe95fb72
added sgemv_n microkernel for sandybridge
2014-07-20 13:17:47 +02:00
wernsaar
d9d4077c93
added sgemv_t microkernel for haswell
2014-07-20 11:30:32 +02:00
wernsaar
02eb72ac42
bugfix in sgemv_t_microk_sandy.c
2014-07-20 10:48:41 +02:00
wernsaar
c06f9986d4
added sgemv_t microkernel for sandybridge
2014-07-20 10:21:08 +02:00
wernsaar
2cce125c79
added optimized sgemv_t for bulldozer and piledriver
2014-07-19 15:48:07 +02:00
wernsaar
b3938fe371
don't use this sgemv_n on Windows
2014-07-19 07:15:34 +02:00
wernsaar
c8a4a56177
performance optimizations for sgemv_n
2014-07-18 11:25:21 +02:00
wernsaar
3c5732615d
added blocked sgemv_n and microkernel for bulldozer and piledriver
2014-07-17 23:15:07 +02:00
wernsaar
880597b301
segment violation in sgemv kernels
2014-07-13 10:46:14 +02:00
wernsaar
0884b73c69
Lapack-test Windows 32bit now error free
2014-07-10 11:01:47 +02:00
wernsaar
9bd9472ae9
Lapack-test: cleanup of x86 32bit KERNEL file
2014-07-09 16:08:19 +02:00
wernsaar
c4a423a642
bugfixes for lapack on ARM Platform
2014-07-09 12:21:39 +02:00
wernsaar
13348b2137
removed reference to daxpy_bulldozer kernel (Windows bug in lapack-test)
2014-07-06 16:39:32 +02:00
wernsaar
9964ed2f79
bugfix for CORE2
2014-07-06 11:47:28 +02:00
wernsaar
d5b976f92d
fallback to zgemm_kernel_4x2_sse.S
2014-07-06 11:05:28 +02:00
wernsaar
f7267d9b0e
added missing definition for DUNNINGTON
2014-07-06 10:17:07 +02:00
wernsaar
e0c080a28c
removed reference to zgemm_kernel_4x2_sse3.S (bug in lapack-test)
2014-07-05 16:13:17 +02:00
wernsaar
e80b144932
enabled compiling of *3M functions
2014-07-02 14:11:53 +02:00
wernsaar
be94db096c
disabled *3M functions for x86_64 platforms
2014-07-01 16:18:05 +02:00
wernsaar
b079df9ef4
added optimized sdot- and dsdot-kernel, written in C
2014-06-30 14:46:38 +02:00
wernsaar
01a119abfc
enabled SMP for sbmv and zsbmv, but only for 64bit binaries
2014-06-29 20:35:56 +02:00
Zhang Xianyi
99efbbbad5
Fixed #395 . Enable optimized cgemm for Sandybridge. Added optimized sdot kernel.
...
Fixed c/zgemm, zgemv computational error of haswell, piledriver, bullldozer, and
barcelona on Windows.
Merge branch 'develop' of https://github.com/wernsaar/OpenBLAS into wernsaar-develop
Conflicts:
kernel/Makefile.L1
kernel/x86_64/KERNEL
param.h
2014-06-29 10:34:51 +08:00
wernsaar
22e5aee2dd
fixed zgemv bug for older AMD Processors
2014-06-28 19:04:49 +02:00
wernsaar
35d37e124f
bugfix for barcelona zgemv-kernel
2014-06-28 12:36:11 +02:00
wernsaar
d8ba46efdb
bugfix for bulldozer cgemm-, zgemm- and zgemv-kernel
2014-06-28 12:16:20 +02:00
wernsaar
a15f22a1f6
bugfix for piledriver cgemm-, zgemm- and zgemv-kernel
2014-06-28 11:46:58 +02:00
wernsaar
b94ea89f52
bugfix for haswell cgemm- and zgemm-kernel
2014-06-28 10:22:40 +02:00
wernsaar
35f668bb14
bugfix for cgemm_kernel_8x2_sandy.S
2014-06-28 10:01:56 +02:00
Timothy Gu
6c2ead30f0
Remove all trailing whitespace except lapack-netlib
...
Signed-off-by: Timothy Gu <timothygu99@gmail.com>
2014-06-27 12:05:18 -07:00
wernsaar
365e8de346
added optimized cgemm-kernel for SANDYBRIDGE
2014-06-27 13:40:29 +02:00
wernsaar
578d1b6219
added DSDOT definition and enabled optimized sdot kernel
2014-06-27 11:30:29 +02:00
wernsaar
dabab2b5f4
added new optimized sgemm kernel for SANDYBRIGE
2014-06-26 21:42:08 +02:00
wernsaar
aa2709c4e0
enabled optimized dgemm kernel for NEHALEM
2014-06-26 12:22:29 +02:00
wernsaar
a13bcc1716
enabled optimized sgemv kernel for barcelona and piledriver
2014-06-25 13:50:57 +02:00
wernsaar
d2c82d7543
enabled optimized sgemv kernel for HASWELL
2014-06-25 12:56:45 +02:00
wernsaar
0517672dd0
enabled optimized sgemv kernels for nehalem, sandybridge and bulldozer
2014-06-25 12:38:14 +02:00
wernsaar
23203d52c1
Ref #380 : lowered stack usage for haswell kernels
2014-06-19 14:31:52 +02:00
wernsaar
73545a79cd
Ref #380 : lowered stack usage for piledriver and bulldozer kernels
2014-06-19 14:02:14 +02:00
wernsaar
ff9cfca24c
Ref #385 : added missing return instruction
2014-06-12 15:52:14 +02:00
wernsaar
cee257f384
Ref #51 : added blas extensions zomatcopy and comatcopy
2014-06-10 10:34:54 +02:00
wernsaar
7bfb3011e8
Ref #51 : added blas extension somatcopy
2014-06-09 20:21:13 +02:00
wernsaar
8c8f596238
Ref #51 : added blas extension domatcopy as not opimized reference
2014-06-09 17:11:07 +02:00
wernsaar
faf3ac0aad
Ref #285 : added axpby kernels
2014-06-08 11:54:24 +02:00
Zhang Xianyi
406f5bd22b
Merge branch 'develop' of https://github.com/wernsaar/OpenBLAS into wernsaar-develop
...
Conflicts:
kernel/arm/KERNEL.ARMV6
2014-05-21 11:24:39 +08:00
wernsaar
aaddb05411
bugfix for ARMV6
2014-05-17 13:00:36 +02:00
wernsaar
e826a5a6af
some modifications regarding lapack test
2014-05-16 20:37:41 +02:00
wernsaar
c38379c9dd
bugfixes for ARM regarding lapack tests
2014-05-14 13:03:45 +02:00
wernsaar
a0b07c1440
bugfixs for ARM regarding lapack tests
2014-05-14 12:59:20 +02:00
wernsaar
43fbdb7a5a
added ARMV5 as reference platform
2014-05-13 17:25:19 +02:00
wernsaar
777cebc8c7
added ZERO check to zscal.c because bug in lapack-testing
2014-05-13 16:31:00 +02:00
wernsaar
aa5c73e20f
added ZERO check to zscal.c because bug in lapack-test
2014-05-13 16:25:21 +02:00
wernsaar
5e5ef28ca0
added ZERO check because bug in lapack-test
2014-05-13 15:36:03 +02:00
wernsaar
650ed34336
added ZERO check because bug in lapack-test
2014-05-13 15:31:36 +02:00
wernsaar
5f3b68b4d4
replaced sgemm and cgemm kernels because lapack bugs
2014-05-10 11:24:07 +02:00
wernsaar
2424af62fd
replaced dgemm-kernel because bug in lapack
2014-05-10 10:52:37 +02:00
wernsaar
793509a3b5
replaced files for sdot, sgemv_n and sgemv_t for bug #348
2014-05-06 15:29:39 +02:00
wernsaar
47b22763f8
reduced stack usage on windows to 16K
2014-04-24 14:09:26 +02:00
wernsaar
9db0fb8b02
bugfix for sdsdot
2014-02-28 14:59:36 +01:00