Commit Graph

488 Commits

Author SHA1 Message Date
wernsaar 493d4fe7e5 added reference in C for symv_L 2014-08-16 11:36:48 +02:00
wernsaar 11eab4c019 added optimized cgemv_n for haswell 2014-08-14 19:00:30 +02:00
wernsaar 4568d32b6b added optimized cgemv_t kernel for haswell 2014-08-14 14:10:29 +02:00
wernsaar c1a6374c6f optimized zgemv_n kernel for sandybridge 2014-08-13 16:10:03 +02:00
wernsaar 2470129132 added fast return, if m or n < 1 2014-08-13 13:54:19 +02:00
wernsaar 8c582d362d optimized zgemv_t_microk_haswell-2.c 2014-08-13 13:42:22 +02:00
wernsaar 11e34ddd1b bugfix for zgemv_n_microk_haswell-2.c 2014-08-13 12:54:18 +02:00
wernsaar 9528f0d9ee bugfix in zgemv_n_microk_sandy-2.c 2014-08-13 12:18:03 +02:00
wernsaar b06550519e added optimized cgemv_t c-kernel 2014-08-12 12:15:41 +02:00
wernsaar 6093ee5363 bugfix in zgemv_n_microk_haswell-2.c 2014-08-12 10:02:25 +02:00
wernsaar 07c66b1960 modified algorithm for better numerical stability 2014-08-12 08:35:42 +02:00
wernsaar 58b075daef added optimized zgemv_t kernel for haswell 2014-08-11 16:57:52 +02:00
wernsaar 09fcd3a341 add optimized zgemv_t kernel for bulldozer 2014-08-11 14:19:25 +02:00
wernsaar 726ad085cb added optimized zgemv_t for haswell 2014-08-11 13:10:12 +02:00
wernsaar 6fe416976d added optimimized zgemv_t c-kernel 2014-08-11 09:13:18 +02:00
wernsaar dbc2eff029 disabled optimized haswell zgemv_n kernel for windows ( bad rounding ) 2014-08-10 11:57:24 +02:00
wernsaar 462b4885ff added optimized zgemv_n kernel for haswell 2014-08-10 08:39:17 +02:00
wernsaar aa54fe064c added zgemv_n c-function 2014-08-07 22:30:20 +02:00
wernsaar 006ef3ea01 added optimized dgemv_t kernel for haswell 2014-08-07 10:08:54 +02:00
wernsaar 60f17628cc added optimized dgemv_n kernel for haswell 2014-08-07 09:18:02 +02:00
wernsaar c9bad1403a added optimized sgemv_t kernel for sandybridge 2014-08-07 07:49:33 +02:00
wernsaar 2f8927376f enabled optimized nehalem sgemv_t kernel for windows 2014-08-06 16:58:21 +02:00
wernsaar d945a2b06d added optimized sgemv_t kernel for nehalem 2014-08-06 16:21:48 +02:00
wernsaar ca6c8d06ce enabled optimized sgemv kernels for windows 2014-08-06 14:24:36 +02:00
wernsaar 7aa43c8928 enabled optimized sgemv kernels for windows 2014-08-06 14:06:30 +02:00
wernsaar 891b960854 added optimized sgemv_t kernel for haswell 2014-08-06 13:42:41 +02:00
wernsaar 95a8caa2f3 added optimized sgemv_t kernel 2014-08-06 12:12:17 +02:00
wernsaar 8c05b8105b bugfix in sgemv_n.c 2014-08-05 20:14:29 +02:00
wernsaar c80084a98f changed default x86_64 sgemv_n kernel to sgemv_n.c 2014-08-05 19:42:56 +02:00
wernsaar 2bab92961f enabled optimized sgemv_n kernels for windows 2014-08-05 14:52:54 +02:00
wernsaar 9175b8bd5f changed long to blaslong for windows compatibility 2014-08-05 13:28:39 +02:00
wernsaar 793f2d43b0 added optimized sgemv_n kernel for nehalem 2014-08-05 10:50:08 +02:00
wernsaar a4dde45f87 optimized sgemv_n kernel for sandybridge 2014-08-05 08:53:09 +02:00
wernsaar 7fa7ea3e1e updated haswell optimized sgmv_n kernel 2014-08-05 08:04:47 +02:00
wernsaar 3fbc13eb65 modified sgemv_n for haswell 2014-08-04 16:22:11 +02:00
wernsaar db6917303f added a better optimized sgemv_n kernel for bulldozer and piledriver 2014-08-04 14:29:01 +02:00
wernsaar 5087096711 optimization of sandybridge cgemm-kernel 2014-07-29 19:07:21 +02:00
wernsaar 46bc4fd50c optimized cgemm kernel for haswell 2014-07-29 08:53:09 +02:00
wernsaar 1cc02b4337 optimized sgemm kernel for haswell 2014-07-28 11:50:01 +02:00
wernsaar 1d33547222 optimized zgemm kernel for haswell 2014-07-27 11:51:42 +02:00
wernsaar 125610d23b allow to set custom value for ?GEMM_DEFAULT_UNROLL_MN, optimizations for syrk 2014-07-24 18:43:31 +02:00
wernsaar 6acbafe45b added sgemv_n microkernel for haswell 2014-07-20 14:52:25 +02:00
wernsaar 5392d11b04 optimized sgemv_n_microk_sandy.c 2014-07-20 14:08:04 +02:00
wernsaar c0fe95fb72 added sgemv_n microkernel for sandybridge 2014-07-20 13:17:47 +02:00
wernsaar d9d4077c93 added sgemv_t microkernel for haswell 2014-07-20 11:30:32 +02:00
wernsaar 02eb72ac42 bugfix in sgemv_t_microk_sandy.c 2014-07-20 10:48:41 +02:00
wernsaar c06f9986d4 added sgemv_t microkernel for sandybridge 2014-07-20 10:21:08 +02:00
wernsaar 2cce125c79 added optimized sgemv_t for bulldozer and piledriver 2014-07-19 15:48:07 +02:00
wernsaar b3938fe371 don't use this sgemv_n on Windows 2014-07-19 07:15:34 +02:00
wernsaar c8a4a56177 performance optimizations for sgemv_n 2014-07-18 11:25:21 +02:00
wernsaar 3c5732615d added blocked sgemv_n and microkernel for bulldozer and piledriver 2014-07-17 23:15:07 +02:00
wernsaar 880597b301 segment violation in sgemv kernels 2014-07-13 10:46:14 +02:00
wernsaar 0884b73c69 Lapack-test Windows 32bit now error free 2014-07-10 11:01:47 +02:00
wernsaar 9bd9472ae9 Lapack-test: cleanup of x86 32bit KERNEL file 2014-07-09 16:08:19 +02:00
wernsaar c4a423a642 bugfixes for lapack on ARM Platform 2014-07-09 12:21:39 +02:00
wernsaar 13348b2137 removed reference to daxpy_bulldozer kernel (Windows bug in lapack-test) 2014-07-06 16:39:32 +02:00
wernsaar 9964ed2f79 bugfix for CORE2 2014-07-06 11:47:28 +02:00
wernsaar d5b976f92d fallback to zgemm_kernel_4x2_sse.S 2014-07-06 11:05:28 +02:00
wernsaar f7267d9b0e added missing definition for DUNNINGTON 2014-07-06 10:17:07 +02:00
wernsaar e0c080a28c removed reference to zgemm_kernel_4x2_sse3.S (bug in lapack-test) 2014-07-05 16:13:17 +02:00
wernsaar e80b144932 enabled compiling of *3M functions 2014-07-02 14:11:53 +02:00
wernsaar be94db096c disabled *3M functions for x86_64 platforms 2014-07-01 16:18:05 +02:00
wernsaar b079df9ef4 added optimized sdot- and dsdot-kernel, written in C 2014-06-30 14:46:38 +02:00
wernsaar 01a119abfc enabled SMP for sbmv and zsbmv, but only for 64bit binaries 2014-06-29 20:35:56 +02:00
Zhang Xianyi 99efbbbad5 Fixed #395. Enable optimized cgemm for Sandybridge. Added optimized sdot kernel.
Fixed c/zgemm, zgemv computational error of haswell, piledriver, bullldozer, and
barcelona on Windows.

Merge branch 'develop' of https://github.com/wernsaar/OpenBLAS into wernsaar-develop

Conflicts:
	kernel/Makefile.L1
	kernel/x86_64/KERNEL
	param.h
2014-06-29 10:34:51 +08:00
wernsaar 22e5aee2dd fixed zgemv bug for older AMD Processors 2014-06-28 19:04:49 +02:00
wernsaar 35d37e124f bugfix for barcelona zgemv-kernel 2014-06-28 12:36:11 +02:00
wernsaar d8ba46efdb bugfix for bulldozer cgemm-, zgemm- and zgemv-kernel 2014-06-28 12:16:20 +02:00
wernsaar a15f22a1f6 bugfix for piledriver cgemm-, zgemm- and zgemv-kernel 2014-06-28 11:46:58 +02:00
wernsaar b94ea89f52 bugfix for haswell cgemm- and zgemm-kernel 2014-06-28 10:22:40 +02:00
wernsaar 35f668bb14 bugfix for cgemm_kernel_8x2_sandy.S 2014-06-28 10:01:56 +02:00
Timothy Gu 6c2ead30f0 Remove all trailing whitespace except lapack-netlib
Signed-off-by: Timothy Gu <timothygu99@gmail.com>
2014-06-27 12:05:18 -07:00
wernsaar 365e8de346 added optimized cgemm-kernel for SANDYBRIDGE 2014-06-27 13:40:29 +02:00
wernsaar 578d1b6219 added DSDOT definition and enabled optimized sdot kernel 2014-06-27 11:30:29 +02:00
wernsaar dabab2b5f4 added new optimized sgemm kernel for SANDYBRIGE 2014-06-26 21:42:08 +02:00
wernsaar aa2709c4e0 enabled optimized dgemm kernel for NEHALEM 2014-06-26 12:22:29 +02:00
wernsaar a13bcc1716 enabled optimized sgemv kernel for barcelona and piledriver 2014-06-25 13:50:57 +02:00
wernsaar d2c82d7543 enabled optimized sgemv kernel for HASWELL 2014-06-25 12:56:45 +02:00
wernsaar 0517672dd0 enabled optimized sgemv kernels for nehalem, sandybridge and bulldozer 2014-06-25 12:38:14 +02:00
wernsaar 23203d52c1 Ref #380: lowered stack usage for haswell kernels 2014-06-19 14:31:52 +02:00
wernsaar 73545a79cd Ref #380: lowered stack usage for piledriver and bulldozer kernels 2014-06-19 14:02:14 +02:00
wernsaar ff9cfca24c Ref #385: added missing return instruction 2014-06-12 15:52:14 +02:00
wernsaar cee257f384 Ref #51: added blas extensions zomatcopy and comatcopy 2014-06-10 10:34:54 +02:00
wernsaar 7bfb3011e8 Ref #51: added blas extension somatcopy 2014-06-09 20:21:13 +02:00
wernsaar 8c8f596238 Ref #51: added blas extension domatcopy as not opimized reference 2014-06-09 17:11:07 +02:00
wernsaar faf3ac0aad Ref #285: added axpby kernels 2014-06-08 11:54:24 +02:00
Zhang Xianyi 406f5bd22b Merge branch 'develop' of https://github.com/wernsaar/OpenBLAS into wernsaar-develop
Conflicts:
	kernel/arm/KERNEL.ARMV6
2014-05-21 11:24:39 +08:00
wernsaar aaddb05411 bugfix for ARMV6 2014-05-17 13:00:36 +02:00
wernsaar e826a5a6af some modifications regarding lapack test 2014-05-16 20:37:41 +02:00
wernsaar c38379c9dd bugfixes for ARM regarding lapack tests 2014-05-14 13:03:45 +02:00
wernsaar a0b07c1440 bugfixs for ARM regarding lapack tests 2014-05-14 12:59:20 +02:00
wernsaar 43fbdb7a5a added ARMV5 as reference platform 2014-05-13 17:25:19 +02:00
wernsaar 777cebc8c7 added ZERO check to zscal.c because bug in lapack-testing 2014-05-13 16:31:00 +02:00
wernsaar aa5c73e20f added ZERO check to zscal.c because bug in lapack-test 2014-05-13 16:25:21 +02:00
wernsaar 5e5ef28ca0 added ZERO check because bug in lapack-test 2014-05-13 15:36:03 +02:00
wernsaar 650ed34336 added ZERO check because bug in lapack-test 2014-05-13 15:31:36 +02:00
wernsaar 5f3b68b4d4 replaced sgemm and cgemm kernels because lapack bugs 2014-05-10 11:24:07 +02:00
wernsaar 2424af62fd replaced dgemm-kernel because bug in lapack 2014-05-10 10:52:37 +02:00
wernsaar 793509a3b5 replaced files for sdot, sgemv_n and sgemv_t for bug #348 2014-05-06 15:29:39 +02:00
wernsaar 47b22763f8 reduced stack usage on windows to 16K 2014-04-24 14:09:26 +02:00
wernsaar 9db0fb8b02 bugfix for sdsdot 2014-02-28 14:59:36 +01:00
wernsaar f9daebba0a checked in bugfixes for ARM 2014-02-16 11:45:47 +01:00
Zhang Xianyi 9a557e90da Refs #340. Fixed SEGFAULT bug of dgemv_n on OSX. 2014-02-15 23:23:15 +08:00
wangqian 2d557eb1e0 Fixed computational error of dgemv_n. 2014-02-04 21:47:51 +08:00
Zhang Xianyi 05bb391c3a Refs #330. Fixed the compatible issue with clang on Mac OSX. 2013-12-16 20:31:17 +08:00
Zhang Xianyi 9b5be29886 Refs #310. Fixed Segfault bug on nehalem when Julia calling dgeqrt3 on OSX.
Please also check JuliaLang/julia#4099
Julia test script:
  A=rand(256, 256)
  qrfact(A)

I found this was a bug in kernel/x86_64/dgemm_ncopy_8.S.
However, I cannot use gdb with julia. Thus, this is a walkaround fix.
2013-12-12 23:23:04 +08:00
wernsaar 53eaf41901 added support for HASWELL 2013-12-02 13:17:51 +01:00
wernsaar 9423f980f6 modified trsm kernel 2013-12-02 10:08:14 +01:00
wernsaar c6156b2ef2 added trsm kernels from origin 2013-12-01 22:39:39 +01:00
wernsaar 034a5b2083 modified zsymv 2013-12-01 21:07:49 +01:00
wernsaar 27d4234d4d merged symv 2013-12-01 20:56:02 +01:00
wernsaar 402d6e91db Merge remote branch 'origin/develop' into armv7 2013-12-01 18:18:40 +01:00
wernsaar b3254eecaf Merge remote branch 'origin/haswell' into develop 2013-12-01 18:09:12 +01:00
wernsaar d910404f00 Merge remote branch 'origin/piledriver' into develop 2013-12-01 18:06:51 +01:00
wernsaar ffe70b1fdc modified Makefile.L3 2013-12-01 17:58:46 +01:00
wernsaar 0b6e13b689 Merge remote branch 'origin/develop' into haswell 2013-12-01 13:38:11 +01:00
wernsaar e09dc279a2 Merge remote branch 'origin/develop' into piledriver 2013-12-01 13:33:18 +01:00
wernsaar 4be4db590c Merge remote branch 'origin/develop' into armv7 2013-12-01 13:16:41 +01:00
wernsaar 5c648a8984 Merge remote branch 'origin/develop' into haswell 2013-12-01 11:25:33 +01:00
wernsaar c44dc4dd3c Merge remote branch 'origin/develop' into piledriver 2013-12-01 11:06:36 +01:00
wernsaar 9d3fae15a8 Merge branch 'develop' into armv7 2013-12-01 10:12:07 +01:00
wernsaar 2d3c884294 added complex gemv kernels for ARMV6 and ARMV7 2013-11-29 17:06:33 +01:00
wernsaar d54a061713 optimized gemv_n_vfp.S 2013-11-28 17:40:21 +01:00
wernsaar 86afb47e83 added optimized ctrmm kernel for ARMV6 2013-11-28 14:35:07 +01:00
wernsaar 42a4dff056 added optimized ztrmm kernel for ARMV6 2013-11-28 13:41:06 +01:00
wernsaar 5bc322a66c optimized strmm kernel for ARMV6 2013-11-28 12:45:38 +01:00
wernsaar dec7ad0dfd optimized dtrmm kernel for ARMV7 2013-11-28 12:32:12 +01:00
wernsaar 274304bd03 add optimized cgemm kernel for ARMV6 2013-11-28 11:54:38 +01:00
wernsaar 5007a534c4 optimized zgemm kernel for ARMV6 2013-11-28 10:04:43 +01:00
wernsaar a537d7d8d7 optimized zgemm_kernel_2x2_vfp.S 2013-11-28 08:33:44 +01:00
wernsaar b42145834f optimized sgemm kernel for ARMV6 2013-11-28 08:08:08 +01:00
wernsaar 3d5e792c72 optimized sgemm kernel for ARMV6 2013-11-27 18:38:32 +01:00
wernsaar a9bd12da2c optimized dgemm kernel for ARMV6 2013-11-27 17:37:38 +01:00
wernsaar 697e198e8a added zgemm_kernel for ARMV6 2013-11-27 16:15:06 +01:00
wernsaar 36b0f7fe1d added optimized gemv_t kernel for ARMV6 2013-11-25 19:31:27 +01:00
wernsaar d2b20c5c51 add optimized axpy kernel 2013-11-25 12:25:58 +01:00
wernsaar fe5f46c330 added experimental support for ARMV8 2013-11-24 15:47:00 +01:00
wernsaar 25c6050593 add single and double precision gemv_n kernel for ARMV6 2013-11-24 12:03:28 +01:00
wernsaar 12e02a00e0 added ncopy kernels for ARMV6 2013-11-24 08:46:47 +01:00
wernsaar 29a3196f56 added optimized sgemm and strmm kernel for ARMV6 2013-11-23 18:09:41 +01:00
wernsaar 8776a73773 added optimized dgemm and dtrmm kernel for ARMV6 2013-11-23 16:24:52 +01:00
wernsaar 7e84acd3e8 fixed bug in SAVE macros, that are not found by any test routine 2013-11-23 14:35:19 +01:00
wernsaar 33d3ab6e09 small optimizations for zgemv kernels 2013-11-23 12:35:31 +01:00
wernsaar 9a0f978929 added nrm2 kernel for ARMV6 2013-11-22 17:21:10 +01:00
wernsaar 7f210587f0 renamed some ncopy and tcopy files 2013-11-22 00:20:25 +01:00
wernsaar 9f0a3a35b3 removed obsolete file sdot_vfpv3.S 2013-11-21 23:42:54 +01:00
wernsaar dbae93110b added sdot_vfp.S 2013-11-21 23:34:51 +01:00
wernsaar 19cd5c64a2 renamed swap_vfpv3.S to swap_vfp.S 2013-11-21 23:19:32 +01:00
wernsaar 9adf87495e renamed some dot kernels 2013-11-21 23:07:51 +01:00
wernsaar 440db4cdda delete rot_vfpv3.S 2013-11-21 22:52:24 +01:00