wernsaar
493d4fe7e5
added reference in C for symv_L
2014-08-16 11:36:48 +02:00
wernsaar
11eab4c019
added optimized cgemv_n for haswell
2014-08-14 19:00:30 +02:00
wernsaar
4568d32b6b
added optimized cgemv_t kernel for haswell
2014-08-14 14:10:29 +02:00
wernsaar
c1a6374c6f
optimized zgemv_n kernel for sandybridge
2014-08-13 16:10:03 +02:00
wernsaar
2470129132
added fast return, if m or n < 1
2014-08-13 13:54:19 +02:00
wernsaar
8c582d362d
optimized zgemv_t_microk_haswell-2.c
2014-08-13 13:42:22 +02:00
wernsaar
11e34ddd1b
bugfix for zgemv_n_microk_haswell-2.c
2014-08-13 12:54:18 +02:00
wernsaar
9528f0d9ee
bugfix in zgemv_n_microk_sandy-2.c
2014-08-13 12:18:03 +02:00
wernsaar
b06550519e
added optimized cgemv_t c-kernel
2014-08-12 12:15:41 +02:00
wernsaar
6093ee5363
bugfix in zgemv_n_microk_haswell-2.c
2014-08-12 10:02:25 +02:00
wernsaar
07c66b1960
modified algorithm for better numerical stability
2014-08-12 08:35:42 +02:00
wernsaar
58b075daef
added optimized zgemv_t kernel for haswell
2014-08-11 16:57:52 +02:00
wernsaar
09fcd3a341
add optimized zgemv_t kernel for bulldozer
2014-08-11 14:19:25 +02:00
wernsaar
726ad085cb
added optimized zgemv_t for haswell
2014-08-11 13:10:12 +02:00
wernsaar
6fe416976d
added optimimized zgemv_t c-kernel
2014-08-11 09:13:18 +02:00
wernsaar
dbc2eff029
disabled optimized haswell zgemv_n kernel for windows ( bad rounding )
2014-08-10 11:57:24 +02:00
wernsaar
462b4885ff
added optimized zgemv_n kernel for haswell
2014-08-10 08:39:17 +02:00
wernsaar
aa54fe064c
added zgemv_n c-function
2014-08-07 22:30:20 +02:00
wernsaar
006ef3ea01
added optimized dgemv_t kernel for haswell
2014-08-07 10:08:54 +02:00
wernsaar
60f17628cc
added optimized dgemv_n kernel for haswell
2014-08-07 09:18:02 +02:00
wernsaar
c9bad1403a
added optimized sgemv_t kernel for sandybridge
2014-08-07 07:49:33 +02:00
wernsaar
2f8927376f
enabled optimized nehalem sgemv_t kernel for windows
2014-08-06 16:58:21 +02:00
wernsaar
d945a2b06d
added optimized sgemv_t kernel for nehalem
2014-08-06 16:21:48 +02:00
wernsaar
ca6c8d06ce
enabled optimized sgemv kernels for windows
2014-08-06 14:24:36 +02:00
wernsaar
7aa43c8928
enabled optimized sgemv kernels for windows
2014-08-06 14:06:30 +02:00
wernsaar
891b960854
added optimized sgemv_t kernel for haswell
2014-08-06 13:42:41 +02:00
wernsaar
95a8caa2f3
added optimized sgemv_t kernel
2014-08-06 12:12:17 +02:00
wernsaar
8c05b8105b
bugfix in sgemv_n.c
2014-08-05 20:14:29 +02:00
wernsaar
c80084a98f
changed default x86_64 sgemv_n kernel to sgemv_n.c
2014-08-05 19:42:56 +02:00
wernsaar
2bab92961f
enabled optimized sgemv_n kernels for windows
2014-08-05 14:52:54 +02:00
wernsaar
9175b8bd5f
changed long to blaslong for windows compatibility
2014-08-05 13:28:39 +02:00
wernsaar
793f2d43b0
added optimized sgemv_n kernel for nehalem
2014-08-05 10:50:08 +02:00
wernsaar
a4dde45f87
optimized sgemv_n kernel for sandybridge
2014-08-05 08:53:09 +02:00
wernsaar
7fa7ea3e1e
updated haswell optimized sgmv_n kernel
2014-08-05 08:04:47 +02:00
wernsaar
3fbc13eb65
modified sgemv_n for haswell
2014-08-04 16:22:11 +02:00
wernsaar
db6917303f
added a better optimized sgemv_n kernel for bulldozer and piledriver
2014-08-04 14:29:01 +02:00
wernsaar
5087096711
optimization of sandybridge cgemm-kernel
2014-07-29 19:07:21 +02:00
wernsaar
46bc4fd50c
optimized cgemm kernel for haswell
2014-07-29 08:53:09 +02:00
wernsaar
1cc02b4337
optimized sgemm kernel for haswell
2014-07-28 11:50:01 +02:00
wernsaar
1d33547222
optimized zgemm kernel for haswell
2014-07-27 11:51:42 +02:00
wernsaar
125610d23b
allow to set custom value for ?GEMM_DEFAULT_UNROLL_MN, optimizations for syrk
2014-07-24 18:43:31 +02:00
wernsaar
6acbafe45b
added sgemv_n microkernel for haswell
2014-07-20 14:52:25 +02:00
wernsaar
5392d11b04
optimized sgemv_n_microk_sandy.c
2014-07-20 14:08:04 +02:00
wernsaar
c0fe95fb72
added sgemv_n microkernel for sandybridge
2014-07-20 13:17:47 +02:00
wernsaar
d9d4077c93
added sgemv_t microkernel for haswell
2014-07-20 11:30:32 +02:00
wernsaar
02eb72ac42
bugfix in sgemv_t_microk_sandy.c
2014-07-20 10:48:41 +02:00
wernsaar
c06f9986d4
added sgemv_t microkernel for sandybridge
2014-07-20 10:21:08 +02:00
wernsaar
2cce125c79
added optimized sgemv_t for bulldozer and piledriver
2014-07-19 15:48:07 +02:00
wernsaar
b3938fe371
don't use this sgemv_n on Windows
2014-07-19 07:15:34 +02:00
wernsaar
c8a4a56177
performance optimizations for sgemv_n
2014-07-18 11:25:21 +02:00
wernsaar
3c5732615d
added blocked sgemv_n and microkernel for bulldozer and piledriver
2014-07-17 23:15:07 +02:00
wernsaar
880597b301
segment violation in sgemv kernels
2014-07-13 10:46:14 +02:00
wernsaar
0884b73c69
Lapack-test Windows 32bit now error free
2014-07-10 11:01:47 +02:00
wernsaar
9bd9472ae9
Lapack-test: cleanup of x86 32bit KERNEL file
2014-07-09 16:08:19 +02:00
wernsaar
c4a423a642
bugfixes for lapack on ARM Platform
2014-07-09 12:21:39 +02:00
wernsaar
13348b2137
removed reference to daxpy_bulldozer kernel (Windows bug in lapack-test)
2014-07-06 16:39:32 +02:00
wernsaar
9964ed2f79
bugfix for CORE2
2014-07-06 11:47:28 +02:00
wernsaar
d5b976f92d
fallback to zgemm_kernel_4x2_sse.S
2014-07-06 11:05:28 +02:00
wernsaar
f7267d9b0e
added missing definition for DUNNINGTON
2014-07-06 10:17:07 +02:00
wernsaar
e0c080a28c
removed reference to zgemm_kernel_4x2_sse3.S (bug in lapack-test)
2014-07-05 16:13:17 +02:00
wernsaar
e80b144932
enabled compiling of *3M functions
2014-07-02 14:11:53 +02:00
wernsaar
be94db096c
disabled *3M functions for x86_64 platforms
2014-07-01 16:18:05 +02:00
wernsaar
b079df9ef4
added optimized sdot- and dsdot-kernel, written in C
2014-06-30 14:46:38 +02:00
wernsaar
01a119abfc
enabled SMP for sbmv and zsbmv, but only for 64bit binaries
2014-06-29 20:35:56 +02:00
Zhang Xianyi
99efbbbad5
Fixed #395 . Enable optimized cgemm for Sandybridge. Added optimized sdot kernel.
...
Fixed c/zgemm, zgemv computational error of haswell, piledriver, bullldozer, and
barcelona on Windows.
Merge branch 'develop' of https://github.com/wernsaar/OpenBLAS into wernsaar-develop
Conflicts:
kernel/Makefile.L1
kernel/x86_64/KERNEL
param.h
2014-06-29 10:34:51 +08:00
wernsaar
22e5aee2dd
fixed zgemv bug for older AMD Processors
2014-06-28 19:04:49 +02:00
wernsaar
35d37e124f
bugfix for barcelona zgemv-kernel
2014-06-28 12:36:11 +02:00
wernsaar
d8ba46efdb
bugfix for bulldozer cgemm-, zgemm- and zgemv-kernel
2014-06-28 12:16:20 +02:00
wernsaar
a15f22a1f6
bugfix for piledriver cgemm-, zgemm- and zgemv-kernel
2014-06-28 11:46:58 +02:00
wernsaar
b94ea89f52
bugfix for haswell cgemm- and zgemm-kernel
2014-06-28 10:22:40 +02:00
wernsaar
35f668bb14
bugfix for cgemm_kernel_8x2_sandy.S
2014-06-28 10:01:56 +02:00
Timothy Gu
6c2ead30f0
Remove all trailing whitespace except lapack-netlib
...
Signed-off-by: Timothy Gu <timothygu99@gmail.com>
2014-06-27 12:05:18 -07:00
wernsaar
365e8de346
added optimized cgemm-kernel for SANDYBRIDGE
2014-06-27 13:40:29 +02:00
wernsaar
578d1b6219
added DSDOT definition and enabled optimized sdot kernel
2014-06-27 11:30:29 +02:00
wernsaar
dabab2b5f4
added new optimized sgemm kernel for SANDYBRIGE
2014-06-26 21:42:08 +02:00
wernsaar
aa2709c4e0
enabled optimized dgemm kernel for NEHALEM
2014-06-26 12:22:29 +02:00
wernsaar
a13bcc1716
enabled optimized sgemv kernel for barcelona and piledriver
2014-06-25 13:50:57 +02:00
wernsaar
d2c82d7543
enabled optimized sgemv kernel for HASWELL
2014-06-25 12:56:45 +02:00
wernsaar
0517672dd0
enabled optimized sgemv kernels for nehalem, sandybridge and bulldozer
2014-06-25 12:38:14 +02:00
wernsaar
23203d52c1
Ref #380 : lowered stack usage for haswell kernels
2014-06-19 14:31:52 +02:00
wernsaar
73545a79cd
Ref #380 : lowered stack usage for piledriver and bulldozer kernels
2014-06-19 14:02:14 +02:00
wernsaar
ff9cfca24c
Ref #385 : added missing return instruction
2014-06-12 15:52:14 +02:00
wernsaar
cee257f384
Ref #51 : added blas extensions zomatcopy and comatcopy
2014-06-10 10:34:54 +02:00
wernsaar
7bfb3011e8
Ref #51 : added blas extension somatcopy
2014-06-09 20:21:13 +02:00
wernsaar
8c8f596238
Ref #51 : added blas extension domatcopy as not opimized reference
2014-06-09 17:11:07 +02:00
wernsaar
faf3ac0aad
Ref #285 : added axpby kernels
2014-06-08 11:54:24 +02:00
Zhang Xianyi
406f5bd22b
Merge branch 'develop' of https://github.com/wernsaar/OpenBLAS into wernsaar-develop
...
Conflicts:
kernel/arm/KERNEL.ARMV6
2014-05-21 11:24:39 +08:00
wernsaar
aaddb05411
bugfix for ARMV6
2014-05-17 13:00:36 +02:00
wernsaar
e826a5a6af
some modifications regarding lapack test
2014-05-16 20:37:41 +02:00
wernsaar
c38379c9dd
bugfixes for ARM regarding lapack tests
2014-05-14 13:03:45 +02:00
wernsaar
a0b07c1440
bugfixs for ARM regarding lapack tests
2014-05-14 12:59:20 +02:00
wernsaar
43fbdb7a5a
added ARMV5 as reference platform
2014-05-13 17:25:19 +02:00
wernsaar
777cebc8c7
added ZERO check to zscal.c because bug in lapack-testing
2014-05-13 16:31:00 +02:00
wernsaar
aa5c73e20f
added ZERO check to zscal.c because bug in lapack-test
2014-05-13 16:25:21 +02:00
wernsaar
5e5ef28ca0
added ZERO check because bug in lapack-test
2014-05-13 15:36:03 +02:00
wernsaar
650ed34336
added ZERO check because bug in lapack-test
2014-05-13 15:31:36 +02:00
wernsaar
5f3b68b4d4
replaced sgemm and cgemm kernels because lapack bugs
2014-05-10 11:24:07 +02:00
wernsaar
2424af62fd
replaced dgemm-kernel because bug in lapack
2014-05-10 10:52:37 +02:00
wernsaar
793509a3b5
replaced files for sdot, sgemv_n and sgemv_t for bug #348
2014-05-06 15:29:39 +02:00
wernsaar
47b22763f8
reduced stack usage on windows to 16K
2014-04-24 14:09:26 +02:00
wernsaar
9db0fb8b02
bugfix for sdsdot
2014-02-28 14:59:36 +01:00
wernsaar
f9daebba0a
checked in bugfixes for ARM
2014-02-16 11:45:47 +01:00
Zhang Xianyi
9a557e90da
Refs #340 . Fixed SEGFAULT bug of dgemv_n on OSX.
2014-02-15 23:23:15 +08:00
wangqian
2d557eb1e0
Fixed computational error of dgemv_n.
2014-02-04 21:47:51 +08:00
Zhang Xianyi
05bb391c3a
Refs #330 . Fixed the compatible issue with clang on Mac OSX.
2013-12-16 20:31:17 +08:00
Zhang Xianyi
9b5be29886
Refs #310 . Fixed Segfault bug on nehalem when Julia calling dgeqrt3 on OSX.
...
Please also check JuliaLang/julia#4099
Julia test script:
A=rand(256, 256)
qrfact(A)
I found this was a bug in kernel/x86_64/dgemm_ncopy_8.S.
However, I cannot use gdb with julia. Thus, this is a walkaround fix.
2013-12-12 23:23:04 +08:00
wernsaar
53eaf41901
added support for HASWELL
2013-12-02 13:17:51 +01:00
wernsaar
9423f980f6
modified trsm kernel
2013-12-02 10:08:14 +01:00
wernsaar
c6156b2ef2
added trsm kernels from origin
2013-12-01 22:39:39 +01:00
wernsaar
034a5b2083
modified zsymv
2013-12-01 21:07:49 +01:00
wernsaar
27d4234d4d
merged symv
2013-12-01 20:56:02 +01:00
wernsaar
402d6e91db
Merge remote branch 'origin/develop' into armv7
2013-12-01 18:18:40 +01:00
wernsaar
b3254eecaf
Merge remote branch 'origin/haswell' into develop
2013-12-01 18:09:12 +01:00
wernsaar
d910404f00
Merge remote branch 'origin/piledriver' into develop
2013-12-01 18:06:51 +01:00
wernsaar
ffe70b1fdc
modified Makefile.L3
2013-12-01 17:58:46 +01:00
wernsaar
0b6e13b689
Merge remote branch 'origin/develop' into haswell
2013-12-01 13:38:11 +01:00
wernsaar
e09dc279a2
Merge remote branch 'origin/develop' into piledriver
2013-12-01 13:33:18 +01:00
wernsaar
4be4db590c
Merge remote branch 'origin/develop' into armv7
2013-12-01 13:16:41 +01:00
wernsaar
5c648a8984
Merge remote branch 'origin/develop' into haswell
2013-12-01 11:25:33 +01:00
wernsaar
c44dc4dd3c
Merge remote branch 'origin/develop' into piledriver
2013-12-01 11:06:36 +01:00
wernsaar
9d3fae15a8
Merge branch 'develop' into armv7
2013-12-01 10:12:07 +01:00
wernsaar
2d3c884294
added complex gemv kernels for ARMV6 and ARMV7
2013-11-29 17:06:33 +01:00
wernsaar
d54a061713
optimized gemv_n_vfp.S
2013-11-28 17:40:21 +01:00
wernsaar
86afb47e83
added optimized ctrmm kernel for ARMV6
2013-11-28 14:35:07 +01:00
wernsaar
42a4dff056
added optimized ztrmm kernel for ARMV6
2013-11-28 13:41:06 +01:00
wernsaar
5bc322a66c
optimized strmm kernel for ARMV6
2013-11-28 12:45:38 +01:00
wernsaar
dec7ad0dfd
optimized dtrmm kernel for ARMV7
2013-11-28 12:32:12 +01:00
wernsaar
274304bd03
add optimized cgemm kernel for ARMV6
2013-11-28 11:54:38 +01:00
wernsaar
5007a534c4
optimized zgemm kernel for ARMV6
2013-11-28 10:04:43 +01:00
wernsaar
a537d7d8d7
optimized zgemm_kernel_2x2_vfp.S
2013-11-28 08:33:44 +01:00
wernsaar
b42145834f
optimized sgemm kernel for ARMV6
2013-11-28 08:08:08 +01:00
wernsaar
3d5e792c72
optimized sgemm kernel for ARMV6
2013-11-27 18:38:32 +01:00
wernsaar
a9bd12da2c
optimized dgemm kernel for ARMV6
2013-11-27 17:37:38 +01:00
wernsaar
697e198e8a
added zgemm_kernel for ARMV6
2013-11-27 16:15:06 +01:00
wernsaar
36b0f7fe1d
added optimized gemv_t kernel for ARMV6
2013-11-25 19:31:27 +01:00
wernsaar
d2b20c5c51
add optimized axpy kernel
2013-11-25 12:25:58 +01:00
wernsaar
fe5f46c330
added experimental support for ARMV8
2013-11-24 15:47:00 +01:00
wernsaar
25c6050593
add single and double precision gemv_n kernel for ARMV6
2013-11-24 12:03:28 +01:00
wernsaar
12e02a00e0
added ncopy kernels for ARMV6
2013-11-24 08:46:47 +01:00
wernsaar
29a3196f56
added optimized sgemm and strmm kernel for ARMV6
2013-11-23 18:09:41 +01:00
wernsaar
8776a73773
added optimized dgemm and dtrmm kernel for ARMV6
2013-11-23 16:24:52 +01:00
wernsaar
7e84acd3e8
fixed bug in SAVE macros, that are not found by any test routine
2013-11-23 14:35:19 +01:00
wernsaar
33d3ab6e09
small optimizations for zgemv kernels
2013-11-23 12:35:31 +01:00
wernsaar
9a0f978929
added nrm2 kernel for ARMV6
2013-11-22 17:21:10 +01:00
wernsaar
7f210587f0
renamed some ncopy and tcopy files
2013-11-22 00:20:25 +01:00
wernsaar
9f0a3a35b3
removed obsolete file sdot_vfpv3.S
2013-11-21 23:42:54 +01:00
wernsaar
dbae93110b
added sdot_vfp.S
2013-11-21 23:34:51 +01:00
wernsaar
19cd5c64a2
renamed swap_vfpv3.S to swap_vfp.S
2013-11-21 23:19:32 +01:00
wernsaar
9adf87495e
renamed some dot kernels
2013-11-21 23:07:51 +01:00
wernsaar
440db4cdda
delete rot_vfpv3.S
2013-11-21 22:52:24 +01:00