Commit Graph

437 Commits

Author SHA1 Message Date
wernsaar 11eab4c019 added optimized cgemv_n for haswell 2014-08-14 19:00:30 +02:00
wernsaar 4568d32b6b added optimized cgemv_t kernel for haswell 2014-08-14 14:10:29 +02:00
wernsaar c1a6374c6f optimized zgemv_n kernel for sandybridge 2014-08-13 16:10:03 +02:00
wernsaar 2470129132 added fast return, if m or n < 1 2014-08-13 13:54:19 +02:00
wernsaar 8c582d362d optimized zgemv_t_microk_haswell-2.c 2014-08-13 13:42:22 +02:00
wernsaar 11e34ddd1b bugfix for zgemv_n_microk_haswell-2.c 2014-08-13 12:54:18 +02:00
wernsaar 9528f0d9ee bugfix in zgemv_n_microk_sandy-2.c 2014-08-13 12:18:03 +02:00
wernsaar b06550519e added optimized cgemv_t c-kernel 2014-08-12 12:15:41 +02:00
wernsaar 6093ee5363 bugfix in zgemv_n_microk_haswell-2.c 2014-08-12 10:02:25 +02:00
wernsaar 07c66b1960 modified algorithm for better numerical stability 2014-08-12 08:35:42 +02:00
wernsaar 58b075daef added optimized zgemv_t kernel for haswell 2014-08-11 16:57:52 +02:00
wernsaar 09fcd3a341 add optimized zgemv_t kernel for bulldozer 2014-08-11 14:19:25 +02:00
wernsaar 726ad085cb added optimized zgemv_t for haswell 2014-08-11 13:10:12 +02:00
wernsaar 6fe416976d added optimimized zgemv_t c-kernel 2014-08-11 09:13:18 +02:00
wernsaar dbc2eff029 disabled optimized haswell zgemv_n kernel for windows ( bad rounding ) 2014-08-10 11:57:24 +02:00
wernsaar 462b4885ff added optimized zgemv_n kernel for haswell 2014-08-10 08:39:17 +02:00
wernsaar aa54fe064c added zgemv_n c-function 2014-08-07 22:30:20 +02:00
wernsaar 006ef3ea01 added optimized dgemv_t kernel for haswell 2014-08-07 10:08:54 +02:00
wernsaar 60f17628cc added optimized dgemv_n kernel for haswell 2014-08-07 09:18:02 +02:00
wernsaar c9bad1403a added optimized sgemv_t kernel for sandybridge 2014-08-07 07:49:33 +02:00
wernsaar 2f8927376f enabled optimized nehalem sgemv_t kernel for windows 2014-08-06 16:58:21 +02:00
wernsaar d945a2b06d added optimized sgemv_t kernel for nehalem 2014-08-06 16:21:48 +02:00
wernsaar ca6c8d06ce enabled optimized sgemv kernels for windows 2014-08-06 14:24:36 +02:00
wernsaar 7aa43c8928 enabled optimized sgemv kernels for windows 2014-08-06 14:06:30 +02:00
wernsaar 891b960854 added optimized sgemv_t kernel for haswell 2014-08-06 13:42:41 +02:00
wernsaar 95a8caa2f3 added optimized sgemv_t kernel 2014-08-06 12:12:17 +02:00
wernsaar 8c05b8105b bugfix in sgemv_n.c 2014-08-05 20:14:29 +02:00
wernsaar c80084a98f changed default x86_64 sgemv_n kernel to sgemv_n.c 2014-08-05 19:42:56 +02:00
wernsaar 2bab92961f enabled optimized sgemv_n kernels for windows 2014-08-05 14:52:54 +02:00
wernsaar 9175b8bd5f changed long to blaslong for windows compatibility 2014-08-05 13:28:39 +02:00
wernsaar 793f2d43b0 added optimized sgemv_n kernel for nehalem 2014-08-05 10:50:08 +02:00
wernsaar a4dde45f87 optimized sgemv_n kernel for sandybridge 2014-08-05 08:53:09 +02:00
wernsaar 7fa7ea3e1e updated haswell optimized sgmv_n kernel 2014-08-05 08:04:47 +02:00
wernsaar 3fbc13eb65 modified sgemv_n for haswell 2014-08-04 16:22:11 +02:00
wernsaar db6917303f added a better optimized sgemv_n kernel for bulldozer and piledriver 2014-08-04 14:29:01 +02:00
wernsaar 5087096711 optimization of sandybridge cgemm-kernel 2014-07-29 19:07:21 +02:00
wernsaar 46bc4fd50c optimized cgemm kernel for haswell 2014-07-29 08:53:09 +02:00
wernsaar 1cc02b4337 optimized sgemm kernel for haswell 2014-07-28 11:50:01 +02:00
wernsaar 1d33547222 optimized zgemm kernel for haswell 2014-07-27 11:51:42 +02:00
wernsaar 125610d23b allow to set custom value for ?GEMM_DEFAULT_UNROLL_MN, optimizations for syrk 2014-07-24 18:43:31 +02:00
wernsaar 6acbafe45b added sgemv_n microkernel for haswell 2014-07-20 14:52:25 +02:00
wernsaar 5392d11b04 optimized sgemv_n_microk_sandy.c 2014-07-20 14:08:04 +02:00
wernsaar c0fe95fb72 added sgemv_n microkernel for sandybridge 2014-07-20 13:17:47 +02:00
wernsaar d9d4077c93 added sgemv_t microkernel for haswell 2014-07-20 11:30:32 +02:00
wernsaar 02eb72ac42 bugfix in sgemv_t_microk_sandy.c 2014-07-20 10:48:41 +02:00
wernsaar c06f9986d4 added sgemv_t microkernel for sandybridge 2014-07-20 10:21:08 +02:00
wernsaar 2cce125c79 added optimized sgemv_t for bulldozer and piledriver 2014-07-19 15:48:07 +02:00
wernsaar b3938fe371 don't use this sgemv_n on Windows 2014-07-19 07:15:34 +02:00
wernsaar c8a4a56177 performance optimizations for sgemv_n 2014-07-18 11:25:21 +02:00
wernsaar 3c5732615d added blocked sgemv_n and microkernel for bulldozer and piledriver 2014-07-17 23:15:07 +02:00
wernsaar 880597b301 segment violation in sgemv kernels 2014-07-13 10:46:14 +02:00
wernsaar 0884b73c69 Lapack-test Windows 32bit now error free 2014-07-10 11:01:47 +02:00
wernsaar 9bd9472ae9 Lapack-test: cleanup of x86 32bit KERNEL file 2014-07-09 16:08:19 +02:00
wernsaar c4a423a642 bugfixes for lapack on ARM Platform 2014-07-09 12:21:39 +02:00
wernsaar 13348b2137 removed reference to daxpy_bulldozer kernel (Windows bug in lapack-test) 2014-07-06 16:39:32 +02:00
wernsaar 9964ed2f79 bugfix for CORE2 2014-07-06 11:47:28 +02:00
wernsaar d5b976f92d fallback to zgemm_kernel_4x2_sse.S 2014-07-06 11:05:28 +02:00
wernsaar f7267d9b0e added missing definition for DUNNINGTON 2014-07-06 10:17:07 +02:00
wernsaar e0c080a28c removed reference to zgemm_kernel_4x2_sse3.S (bug in lapack-test) 2014-07-05 16:13:17 +02:00
wernsaar e80b144932 enabled compiling of *3M functions 2014-07-02 14:11:53 +02:00
wernsaar be94db096c disabled *3M functions for x86_64 platforms 2014-07-01 16:18:05 +02:00
wernsaar b079df9ef4 added optimized sdot- and dsdot-kernel, written in C 2014-06-30 14:46:38 +02:00
wernsaar 01a119abfc enabled SMP for sbmv and zsbmv, but only for 64bit binaries 2014-06-29 20:35:56 +02:00
Zhang Xianyi 99efbbbad5 Fixed #395. Enable optimized cgemm for Sandybridge. Added optimized sdot kernel.
Fixed c/zgemm, zgemv computational error of haswell, piledriver, bullldozer, and
barcelona on Windows.

Merge branch 'develop' of https://github.com/wernsaar/OpenBLAS into wernsaar-develop

Conflicts:
	kernel/Makefile.L1
	kernel/x86_64/KERNEL
	param.h
2014-06-29 10:34:51 +08:00
wernsaar 22e5aee2dd fixed zgemv bug for older AMD Processors 2014-06-28 19:04:49 +02:00
wernsaar 35d37e124f bugfix for barcelona zgemv-kernel 2014-06-28 12:36:11 +02:00
wernsaar d8ba46efdb bugfix for bulldozer cgemm-, zgemm- and zgemv-kernel 2014-06-28 12:16:20 +02:00
wernsaar a15f22a1f6 bugfix for piledriver cgemm-, zgemm- and zgemv-kernel 2014-06-28 11:46:58 +02:00
wernsaar b94ea89f52 bugfix for haswell cgemm- and zgemm-kernel 2014-06-28 10:22:40 +02:00
wernsaar 35f668bb14 bugfix for cgemm_kernel_8x2_sandy.S 2014-06-28 10:01:56 +02:00
Timothy Gu 6c2ead30f0 Remove all trailing whitespace except lapack-netlib
Signed-off-by: Timothy Gu <timothygu99@gmail.com>
2014-06-27 12:05:18 -07:00
wernsaar 365e8de346 added optimized cgemm-kernel for SANDYBRIDGE 2014-06-27 13:40:29 +02:00
wernsaar 578d1b6219 added DSDOT definition and enabled optimized sdot kernel 2014-06-27 11:30:29 +02:00
wernsaar dabab2b5f4 added new optimized sgemm kernel for SANDYBRIGE 2014-06-26 21:42:08 +02:00
wernsaar aa2709c4e0 enabled optimized dgemm kernel for NEHALEM 2014-06-26 12:22:29 +02:00
wernsaar a13bcc1716 enabled optimized sgemv kernel for barcelona and piledriver 2014-06-25 13:50:57 +02:00
wernsaar d2c82d7543 enabled optimized sgemv kernel for HASWELL 2014-06-25 12:56:45 +02:00
wernsaar 0517672dd0 enabled optimized sgemv kernels for nehalem, sandybridge and bulldozer 2014-06-25 12:38:14 +02:00
wernsaar 23203d52c1 Ref #380: lowered stack usage for haswell kernels 2014-06-19 14:31:52 +02:00
wernsaar 73545a79cd Ref #380: lowered stack usage for piledriver and bulldozer kernels 2014-06-19 14:02:14 +02:00
wernsaar ff9cfca24c Ref #385: added missing return instruction 2014-06-12 15:52:14 +02:00
wernsaar cee257f384 Ref #51: added blas extensions zomatcopy and comatcopy 2014-06-10 10:34:54 +02:00
wernsaar 7bfb3011e8 Ref #51: added blas extension somatcopy 2014-06-09 20:21:13 +02:00
wernsaar 8c8f596238 Ref #51: added blas extension domatcopy as not opimized reference 2014-06-09 17:11:07 +02:00
wernsaar faf3ac0aad Ref #285: added axpby kernels 2014-06-08 11:54:24 +02:00
Zhang Xianyi 406f5bd22b Merge branch 'develop' of https://github.com/wernsaar/OpenBLAS into wernsaar-develop
Conflicts:
	kernel/arm/KERNEL.ARMV6
2014-05-21 11:24:39 +08:00
wernsaar aaddb05411 bugfix for ARMV6 2014-05-17 13:00:36 +02:00
wernsaar e826a5a6af some modifications regarding lapack test 2014-05-16 20:37:41 +02:00
wernsaar c38379c9dd bugfixes for ARM regarding lapack tests 2014-05-14 13:03:45 +02:00
wernsaar a0b07c1440 bugfixs for ARM regarding lapack tests 2014-05-14 12:59:20 +02:00
wernsaar 43fbdb7a5a added ARMV5 as reference platform 2014-05-13 17:25:19 +02:00
wernsaar 777cebc8c7 added ZERO check to zscal.c because bug in lapack-testing 2014-05-13 16:31:00 +02:00
wernsaar aa5c73e20f added ZERO check to zscal.c because bug in lapack-test 2014-05-13 16:25:21 +02:00
wernsaar 5e5ef28ca0 added ZERO check because bug in lapack-test 2014-05-13 15:36:03 +02:00
wernsaar 650ed34336 added ZERO check because bug in lapack-test 2014-05-13 15:31:36 +02:00
wernsaar 5f3b68b4d4 replaced sgemm and cgemm kernels because lapack bugs 2014-05-10 11:24:07 +02:00
wernsaar 2424af62fd replaced dgemm-kernel because bug in lapack 2014-05-10 10:52:37 +02:00
wernsaar 793509a3b5 replaced files for sdot, sgemv_n and sgemv_t for bug #348 2014-05-06 15:29:39 +02:00
wernsaar 47b22763f8 reduced stack usage on windows to 16K 2014-04-24 14:09:26 +02:00
wernsaar 9db0fb8b02 bugfix for sdsdot 2014-02-28 14:59:36 +01:00