Commit Graph

610 Commits

Author SHA1 Message Date
wernsaar 20cd850125 modification for clang compiler 2014-08-27 09:00:20 +02:00
wernsaar 3885eebdb8 added optimized zaxpy bulldozer kernel 2014-08-25 15:52:35 +02:00
wernsaar ee74445155 added optimized caxpy kernel for bulldozer 2014-08-25 14:53:28 +02:00
wernsaar 9d2ace8bac added optimized daxpy kernel for bulldozer 2014-08-24 10:57:12 +02:00
wernsaar b55f997302 added optimized daxpy kernel for nehalem 2014-08-23 17:53:07 +02:00
wernsaar e45c960c2c added optimized saxpy kernel for nehalem 2014-08-23 17:15:21 +02:00
wernsaar ac76b6267f added optimized dgemv_n kernel for nehalem 2014-08-23 10:40:57 +02:00
wernsaar f1b96c4846 added optimized ddot kernel for bulldozer 2014-08-22 21:19:29 +02:00
wernsaar 16d6be852d added optimized ddot kernel for nehalem 2014-08-22 20:34:41 +02:00
wernsaar 95a707ced3 update of KERNEL.BULLDOZER 2014-08-22 17:01:27 +02:00
wernsaar 5d97b0754c added optimized sdot kernel for nehalem 2014-08-22 17:00:26 +02:00
wernsaar 8a9e868919 added optimized sdot for bulldozer 2014-08-22 14:29:17 +02:00
wernsaar c8b0645266 added optimized symv_L kernels for nehalem 2014-08-21 14:27:00 +02:00
wernsaar ec05ff3f64 added optimized ssymv_L kernel for bulldozer 2014-08-21 13:32:06 +02:00
wernsaar f6f9122660 added optimized dsymv_L kernel for bulldozer 2014-08-21 13:02:53 +02:00
wernsaar 8247f38dc1 added optimized dsymv_U kernel for nehalem 2014-08-20 09:58:04 +02:00
wernsaar ef6374196d updated optimized dsymv_U kernel for bulldozer 2014-08-20 09:00:56 +02:00
wernsaar f824c2b751 updated optimized ssymv_U for bulldozer 2014-08-19 19:25:03 +02:00
wernsaar 4ba4ab623f added optimized ssymv_U kernel for nehalem 2014-08-19 17:09:45 +02:00
wernsaar 4f39447c05 added optimized ssymv_U kernel for bulldozer 2014-08-18 13:52:24 +02:00
wernsaar 74c9465672 added optimized dsymv_U kernel for bulldozer 2014-08-18 12:18:10 +02:00
wernsaar 101dd08173 add reference in C for symv_U 2014-08-16 13:52:50 +02:00
wernsaar 493d4fe7e5 added reference in C for symv_L 2014-08-16 11:36:48 +02:00
wernsaar 11eab4c019 added optimized cgemv_n for haswell 2014-08-14 19:00:30 +02:00
wernsaar 4568d32b6b added optimized cgemv_t kernel for haswell 2014-08-14 14:10:29 +02:00
wernsaar c1a6374c6f optimized zgemv_n kernel for sandybridge 2014-08-13 16:10:03 +02:00
wernsaar 2470129132 added fast return, if m or n < 1 2014-08-13 13:54:19 +02:00
wernsaar 8c582d362d optimized zgemv_t_microk_haswell-2.c 2014-08-13 13:42:22 +02:00
wernsaar 11e34ddd1b bugfix for zgemv_n_microk_haswell-2.c 2014-08-13 12:54:18 +02:00
wernsaar 9528f0d9ee bugfix in zgemv_n_microk_sandy-2.c 2014-08-13 12:18:03 +02:00
wernsaar b06550519e added optimized cgemv_t c-kernel 2014-08-12 12:15:41 +02:00
wernsaar 6093ee5363 bugfix in zgemv_n_microk_haswell-2.c 2014-08-12 10:02:25 +02:00
wernsaar 07c66b1960 modified algorithm for better numerical stability 2014-08-12 08:35:42 +02:00
wernsaar 58b075daef added optimized zgemv_t kernel for haswell 2014-08-11 16:57:52 +02:00
wernsaar 09fcd3a341 add optimized zgemv_t kernel for bulldozer 2014-08-11 14:19:25 +02:00
wernsaar 726ad085cb added optimized zgemv_t for haswell 2014-08-11 13:10:12 +02:00
wernsaar 6fe416976d added optimimized zgemv_t c-kernel 2014-08-11 09:13:18 +02:00
wernsaar dbc2eff029 disabled optimized haswell zgemv_n kernel for windows ( bad rounding ) 2014-08-10 11:57:24 +02:00
wernsaar 462b4885ff added optimized zgemv_n kernel for haswell 2014-08-10 08:39:17 +02:00
wernsaar aa54fe064c added zgemv_n c-function 2014-08-07 22:30:20 +02:00
wernsaar 006ef3ea01 added optimized dgemv_t kernel for haswell 2014-08-07 10:08:54 +02:00
wernsaar 60f17628cc added optimized dgemv_n kernel for haswell 2014-08-07 09:18:02 +02:00
wernsaar c9bad1403a added optimized sgemv_t kernel for sandybridge 2014-08-07 07:49:33 +02:00
wernsaar 2f8927376f enabled optimized nehalem sgemv_t kernel for windows 2014-08-06 16:58:21 +02:00
wernsaar d945a2b06d added optimized sgemv_t kernel for nehalem 2014-08-06 16:21:48 +02:00
wernsaar ca6c8d06ce enabled optimized sgemv kernels for windows 2014-08-06 14:24:36 +02:00
wernsaar 7aa43c8928 enabled optimized sgemv kernels for windows 2014-08-06 14:06:30 +02:00
wernsaar 891b960854 added optimized sgemv_t kernel for haswell 2014-08-06 13:42:41 +02:00
wernsaar 95a8caa2f3 added optimized sgemv_t kernel 2014-08-06 12:12:17 +02:00
wernsaar 8c05b8105b bugfix in sgemv_n.c 2014-08-05 20:14:29 +02:00
wernsaar c80084a98f changed default x86_64 sgemv_n kernel to sgemv_n.c 2014-08-05 19:42:56 +02:00
wernsaar 2bab92961f enabled optimized sgemv_n kernels for windows 2014-08-05 14:52:54 +02:00
wernsaar 9175b8bd5f changed long to blaslong for windows compatibility 2014-08-05 13:28:39 +02:00
wernsaar 793f2d43b0 added optimized sgemv_n kernel for nehalem 2014-08-05 10:50:08 +02:00
wernsaar a4dde45f87 optimized sgemv_n kernel for sandybridge 2014-08-05 08:53:09 +02:00
wernsaar 7fa7ea3e1e updated haswell optimized sgmv_n kernel 2014-08-05 08:04:47 +02:00
wernsaar 3fbc13eb65 modified sgemv_n for haswell 2014-08-04 16:22:11 +02:00
wernsaar db6917303f added a better optimized sgemv_n kernel for bulldozer and piledriver 2014-08-04 14:29:01 +02:00
wernsaar 5087096711 optimization of sandybridge cgemm-kernel 2014-07-29 19:07:21 +02:00
wernsaar 46bc4fd50c optimized cgemm kernel for haswell 2014-07-29 08:53:09 +02:00
wernsaar 1cc02b4337 optimized sgemm kernel for haswell 2014-07-28 11:50:01 +02:00
wernsaar 1d33547222 optimized zgemm kernel for haswell 2014-07-27 11:51:42 +02:00
wernsaar 125610d23b allow to set custom value for ?GEMM_DEFAULT_UNROLL_MN, optimizations for syrk 2014-07-24 18:43:31 +02:00
wernsaar 6acbafe45b added sgemv_n microkernel for haswell 2014-07-20 14:52:25 +02:00
wernsaar 5392d11b04 optimized sgemv_n_microk_sandy.c 2014-07-20 14:08:04 +02:00
wernsaar c0fe95fb72 added sgemv_n microkernel for sandybridge 2014-07-20 13:17:47 +02:00
wernsaar d9d4077c93 added sgemv_t microkernel for haswell 2014-07-20 11:30:32 +02:00
wernsaar 02eb72ac42 bugfix in sgemv_t_microk_sandy.c 2014-07-20 10:48:41 +02:00
wernsaar c06f9986d4 added sgemv_t microkernel for sandybridge 2014-07-20 10:21:08 +02:00
wernsaar 2cce125c79 added optimized sgemv_t for bulldozer and piledriver 2014-07-19 15:48:07 +02:00
wernsaar b3938fe371 don't use this sgemv_n on Windows 2014-07-19 07:15:34 +02:00
wernsaar c8a4a56177 performance optimizations for sgemv_n 2014-07-18 11:25:21 +02:00
wernsaar 3c5732615d added blocked sgemv_n and microkernel for bulldozer and piledriver 2014-07-17 23:15:07 +02:00
wernsaar 880597b301 segment violation in sgemv kernels 2014-07-13 10:46:14 +02:00
wernsaar 0884b73c69 Lapack-test Windows 32bit now error free 2014-07-10 11:01:47 +02:00
wernsaar 9bd9472ae9 Lapack-test: cleanup of x86 32bit KERNEL file 2014-07-09 16:08:19 +02:00
wernsaar c4a423a642 bugfixes for lapack on ARM Platform 2014-07-09 12:21:39 +02:00
wernsaar 13348b2137 removed reference to daxpy_bulldozer kernel (Windows bug in lapack-test) 2014-07-06 16:39:32 +02:00
wernsaar 9964ed2f79 bugfix for CORE2 2014-07-06 11:47:28 +02:00
wernsaar d5b976f92d fallback to zgemm_kernel_4x2_sse.S 2014-07-06 11:05:28 +02:00
wernsaar f7267d9b0e added missing definition for DUNNINGTON 2014-07-06 10:17:07 +02:00
wernsaar e0c080a28c removed reference to zgemm_kernel_4x2_sse3.S (bug in lapack-test) 2014-07-05 16:13:17 +02:00
wernsaar e80b144932 enabled compiling of *3M functions 2014-07-02 14:11:53 +02:00
wernsaar be94db096c disabled *3M functions for x86_64 platforms 2014-07-01 16:18:05 +02:00
wernsaar b079df9ef4 added optimized sdot- and dsdot-kernel, written in C 2014-06-30 14:46:38 +02:00
wernsaar 01a119abfc enabled SMP for sbmv and zsbmv, but only for 64bit binaries 2014-06-29 20:35:56 +02:00
Zhang Xianyi 99efbbbad5 Fixed #395. Enable optimized cgemm for Sandybridge. Added optimized sdot kernel.
Fixed c/zgemm, zgemv computational error of haswell, piledriver, bullldozer, and
barcelona on Windows.

Merge branch 'develop' of https://github.com/wernsaar/OpenBLAS into wernsaar-develop

Conflicts:
	kernel/Makefile.L1
	kernel/x86_64/KERNEL
	param.h
2014-06-29 10:34:51 +08:00
wernsaar 22e5aee2dd fixed zgemv bug for older AMD Processors 2014-06-28 19:04:49 +02:00
wernsaar 35d37e124f bugfix for barcelona zgemv-kernel 2014-06-28 12:36:11 +02:00
wernsaar d8ba46efdb bugfix for bulldozer cgemm-, zgemm- and zgemv-kernel 2014-06-28 12:16:20 +02:00
wernsaar a15f22a1f6 bugfix for piledriver cgemm-, zgemm- and zgemv-kernel 2014-06-28 11:46:58 +02:00
wernsaar b94ea89f52 bugfix for haswell cgemm- and zgemm-kernel 2014-06-28 10:22:40 +02:00
wernsaar 35f668bb14 bugfix for cgemm_kernel_8x2_sandy.S 2014-06-28 10:01:56 +02:00
Timothy Gu 6c2ead30f0 Remove all trailing whitespace except lapack-netlib
Signed-off-by: Timothy Gu <timothygu99@gmail.com>
2014-06-27 12:05:18 -07:00
wernsaar 365e8de346 added optimized cgemm-kernel for SANDYBRIDGE 2014-06-27 13:40:29 +02:00
wernsaar 578d1b6219 added DSDOT definition and enabled optimized sdot kernel 2014-06-27 11:30:29 +02:00
wernsaar dabab2b5f4 added new optimized sgemm kernel for SANDYBRIGE 2014-06-26 21:42:08 +02:00
wernsaar aa2709c4e0 enabled optimized dgemm kernel for NEHALEM 2014-06-26 12:22:29 +02:00
wernsaar a13bcc1716 enabled optimized sgemv kernel for barcelona and piledriver 2014-06-25 13:50:57 +02:00
wernsaar d2c82d7543 enabled optimized sgemv kernel for HASWELL 2014-06-25 12:56:45 +02:00
wernsaar 0517672dd0 enabled optimized sgemv kernels for nehalem, sandybridge and bulldozer 2014-06-25 12:38:14 +02:00
wernsaar 23203d52c1 Ref #380: lowered stack usage for haswell kernels 2014-06-19 14:31:52 +02:00
wernsaar 73545a79cd Ref #380: lowered stack usage for piledriver and bulldozer kernels 2014-06-19 14:02:14 +02:00
wernsaar ff9cfca24c Ref #385: added missing return instruction 2014-06-12 15:52:14 +02:00
wernsaar cee257f384 Ref #51: added blas extensions zomatcopy and comatcopy 2014-06-10 10:34:54 +02:00
wernsaar 7bfb3011e8 Ref #51: added blas extension somatcopy 2014-06-09 20:21:13 +02:00
wernsaar 8c8f596238 Ref #51: added blas extension domatcopy as not opimized reference 2014-06-09 17:11:07 +02:00
wernsaar faf3ac0aad Ref #285: added axpby kernels 2014-06-08 11:54:24 +02:00
Zhang Xianyi 406f5bd22b Merge branch 'develop' of https://github.com/wernsaar/OpenBLAS into wernsaar-develop
Conflicts:
	kernel/arm/KERNEL.ARMV6
2014-05-21 11:24:39 +08:00
wernsaar aaddb05411 bugfix for ARMV6 2014-05-17 13:00:36 +02:00
wernsaar e826a5a6af some modifications regarding lapack test 2014-05-16 20:37:41 +02:00
wernsaar c38379c9dd bugfixes for ARM regarding lapack tests 2014-05-14 13:03:45 +02:00
wernsaar a0b07c1440 bugfixs for ARM regarding lapack tests 2014-05-14 12:59:20 +02:00
wernsaar 43fbdb7a5a added ARMV5 as reference platform 2014-05-13 17:25:19 +02:00
wernsaar 777cebc8c7 added ZERO check to zscal.c because bug in lapack-testing 2014-05-13 16:31:00 +02:00
wernsaar aa5c73e20f added ZERO check to zscal.c because bug in lapack-test 2014-05-13 16:25:21 +02:00
wernsaar 5e5ef28ca0 added ZERO check because bug in lapack-test 2014-05-13 15:36:03 +02:00
wernsaar 650ed34336 added ZERO check because bug in lapack-test 2014-05-13 15:31:36 +02:00
wernsaar 5f3b68b4d4 replaced sgemm and cgemm kernels because lapack bugs 2014-05-10 11:24:07 +02:00
wernsaar 2424af62fd replaced dgemm-kernel because bug in lapack 2014-05-10 10:52:37 +02:00
wernsaar 793509a3b5 replaced files for sdot, sgemv_n and sgemv_t for bug #348 2014-05-06 15:29:39 +02:00
wernsaar 47b22763f8 reduced stack usage on windows to 16K 2014-04-24 14:09:26 +02:00
wernsaar 9db0fb8b02 bugfix for sdsdot 2014-02-28 14:59:36 +01:00
wernsaar f9daebba0a checked in bugfixes for ARM 2014-02-16 11:45:47 +01:00
Zhang Xianyi 9a557e90da Refs #340. Fixed SEGFAULT bug of dgemv_n on OSX. 2014-02-15 23:23:15 +08:00
wangqian 2d557eb1e0 Fixed computational error of dgemv_n. 2014-02-04 21:47:51 +08:00
Zhang Xianyi 05bb391c3a Refs #330. Fixed the compatible issue with clang on Mac OSX. 2013-12-16 20:31:17 +08:00
Zhang Xianyi 9b5be29886 Refs #310. Fixed Segfault bug on nehalem when Julia calling dgeqrt3 on OSX.
Please also check JuliaLang/julia#4099
Julia test script:
  A=rand(256, 256)
  qrfact(A)

I found this was a bug in kernel/x86_64/dgemm_ncopy_8.S.
However, I cannot use gdb with julia. Thus, this is a walkaround fix.
2013-12-12 23:23:04 +08:00
wernsaar 53eaf41901 added support for HASWELL 2013-12-02 13:17:51 +01:00
wernsaar 9423f980f6 modified trsm kernel 2013-12-02 10:08:14 +01:00
wernsaar c6156b2ef2 added trsm kernels from origin 2013-12-01 22:39:39 +01:00
wernsaar 034a5b2083 modified zsymv 2013-12-01 21:07:49 +01:00
wernsaar 27d4234d4d merged symv 2013-12-01 20:56:02 +01:00
wernsaar 402d6e91db Merge remote branch 'origin/develop' into armv7 2013-12-01 18:18:40 +01:00
wernsaar b3254eecaf Merge remote branch 'origin/haswell' into develop 2013-12-01 18:09:12 +01:00
wernsaar d910404f00 Merge remote branch 'origin/piledriver' into develop 2013-12-01 18:06:51 +01:00
wernsaar ffe70b1fdc modified Makefile.L3 2013-12-01 17:58:46 +01:00
wernsaar 0b6e13b689 Merge remote branch 'origin/develop' into haswell 2013-12-01 13:38:11 +01:00
wernsaar e09dc279a2 Merge remote branch 'origin/develop' into piledriver 2013-12-01 13:33:18 +01:00
wernsaar 4be4db590c Merge remote branch 'origin/develop' into armv7 2013-12-01 13:16:41 +01:00
wernsaar 5c648a8984 Merge remote branch 'origin/develop' into haswell 2013-12-01 11:25:33 +01:00
wernsaar c44dc4dd3c Merge remote branch 'origin/develop' into piledriver 2013-12-01 11:06:36 +01:00
wernsaar 9d3fae15a8 Merge branch 'develop' into armv7 2013-12-01 10:12:07 +01:00
wernsaar 2d3c884294 added complex gemv kernels for ARMV6 and ARMV7 2013-11-29 17:06:33 +01:00
wernsaar d54a061713 optimized gemv_n_vfp.S 2013-11-28 17:40:21 +01:00
wernsaar 86afb47e83 added optimized ctrmm kernel for ARMV6 2013-11-28 14:35:07 +01:00
wernsaar 42a4dff056 added optimized ztrmm kernel for ARMV6 2013-11-28 13:41:06 +01:00
wernsaar 5bc322a66c optimized strmm kernel for ARMV6 2013-11-28 12:45:38 +01:00
wernsaar dec7ad0dfd optimized dtrmm kernel for ARMV7 2013-11-28 12:32:12 +01:00
wernsaar 274304bd03 add optimized cgemm kernel for ARMV6 2013-11-28 11:54:38 +01:00
wernsaar 5007a534c4 optimized zgemm kernel for ARMV6 2013-11-28 10:04:43 +01:00
wernsaar a537d7d8d7 optimized zgemm_kernel_2x2_vfp.S 2013-11-28 08:33:44 +01:00
wernsaar b42145834f optimized sgemm kernel for ARMV6 2013-11-28 08:08:08 +01:00
wernsaar 3d5e792c72 optimized sgemm kernel for ARMV6 2013-11-27 18:38:32 +01:00
wernsaar a9bd12da2c optimized dgemm kernel for ARMV6 2013-11-27 17:37:38 +01:00
wernsaar 697e198e8a added zgemm_kernel for ARMV6 2013-11-27 16:15:06 +01:00
wernsaar 36b0f7fe1d added optimized gemv_t kernel for ARMV6 2013-11-25 19:31:27 +01:00
wernsaar d2b20c5c51 add optimized axpy kernel 2013-11-25 12:25:58 +01:00
wernsaar fe5f46c330 added experimental support for ARMV8 2013-11-24 15:47:00 +01:00
wernsaar 25c6050593 add single and double precision gemv_n kernel for ARMV6 2013-11-24 12:03:28 +01:00
wernsaar 12e02a00e0 added ncopy kernels for ARMV6 2013-11-24 08:46:47 +01:00
wernsaar 29a3196f56 added optimized sgemm and strmm kernel for ARMV6 2013-11-23 18:09:41 +01:00
wernsaar 8776a73773 added optimized dgemm and dtrmm kernel for ARMV6 2013-11-23 16:24:52 +01:00
wernsaar 7e84acd3e8 fixed bug in SAVE macros, that are not found by any test routine 2013-11-23 14:35:19 +01:00
wernsaar 33d3ab6e09 small optimizations for zgemv kernels 2013-11-23 12:35:31 +01:00
wernsaar 9a0f978929 added nrm2 kernel for ARMV6 2013-11-22 17:21:10 +01:00
wernsaar 7f210587f0 renamed some ncopy and tcopy files 2013-11-22 00:20:25 +01:00
wernsaar 9f0a3a35b3 removed obsolete file sdot_vfpv3.S 2013-11-21 23:42:54 +01:00
wernsaar dbae93110b added sdot_vfp.S 2013-11-21 23:34:51 +01:00
wernsaar 19cd5c64a2 renamed swap_vfpv3.S to swap_vfp.S 2013-11-21 23:19:32 +01:00
wernsaar 9adf87495e renamed some dot kernels 2013-11-21 23:07:51 +01:00
wernsaar 440db4cdda delete rot_vfpv3.S 2013-11-21 22:52:24 +01:00
wernsaar cd93cae5a7 renamed rot_vfpv3.S to rot_vfp.S 2013-11-21 22:49:28 +01:00
wernsaar 8565afb3c2 renamed asum_vfpv3.S to asum_vfp.S 2013-11-21 22:26:27 +01:00
wernsaar 5bf7cf8d67 renamed scal_vfpv3.S to scal_vfp.S 2013-11-21 22:03:36 +01:00
wernsaar 29a005c635 renamed iamax assembler kernel 2013-11-21 21:12:33 +01:00
wernsaar f1be3a168a renamed some BLAS kernels, which are compatible to ARMV6 2013-11-21 20:48:57 +01:00
wernsaar 410afda9b4 added cpu detection and target ARMV6, used in raspberry pi 2013-11-21 20:18:51 +01:00
wernsaar bf04544902 added gemv_n kernel for single and double precision 2013-11-19 15:07:20 +01:00
wernsaar 86283c0be1 added gemv_t kernel for single and double precision 2013-11-19 09:55:54 +01:00
wernsaar f27cabfd08 added nrm2 kernel for all precisions 2013-11-16 16:17:17 +01:00
wernsaar 23dd474cd0 added rot kernel for all precisions 2013-11-15 14:08:57 +01:00
wernsaar f1b452e160 added scal kernel for all precisions 2013-11-15 11:56:43 +01:00
wernsaar 3dabd7e6e6 added swap-kernel for all precisions 2013-11-14 19:06:19 +01:00
wernsaar 6f4a0ebe38 added max- und min-kernels for all precisions 2013-11-14 13:52:47 +01:00
wernsaar f1db386211 changes for compatibility with Pathscale compiler 2013-11-13 17:59:11 +01:00
wernsaar 6da558d2ab changes for compatibility with Pathscale compiler 2013-11-13 17:39:13 +01:00
wernsaar f750103336 small optimizations on dot-kernels 2013-11-11 15:47:56 +01:00
wernsaar 00f33c0134 added asum_kernel for all precisions and complex 2013-11-11 14:20:59 +01:00
wernsaar 5b36cc0f47 added blas level1 dot kernels for complex and double complex 2013-11-08 09:08:11 +01:00
wernsaar c8f1aeb154 added optimized blas level1 dot kernels for single and double precision 2013-11-07 17:22:03 +01:00
wernsaar 8fa93be06e added optimized blas level1 copy kernels 2013-11-07 17:18:56 +01:00
wernsaar 1e8128f41c added cgemm_tcopy_2_vfpv3.S and zgemm_tcopy_2_vfpv3.S 2013-11-07 17:15:50 +01:00
Zhang Xianyi 2f5fdd2000 Refs #314. Fixed clang compiling bug on OSX. 2013-11-07 08:12:03 +08:00
wernsaar 80a2e901b1 added dgemm_tcopy_4_vfpv3.S and sgemm_tcopy_4_vfpv3.S 2013-11-06 20:01:18 +01:00
wernsaar ac50bccbd2 added cgemm_ncopy_2_vfpv3.S and made assembler labels unique 2013-11-05 20:21:35 +01:00
wernsaar 82015beaef added zgemm_ncopy_2_vfpv3.S and made assembler labels unique 2013-11-05 19:31:22 +01:00
wernsaar 6216ab8a7e removed obsolete gemm_kernels from haswell branch 2013-11-04 08:33:04 +01:00
wernsaar 370e3834a9 added missing file kernel/arm/Makefile 2013-11-03 11:54:39 +01:00
wernsaar e31186efd4 deleted obsolete dgemm_kernel and dtrmm_kernel 2013-11-02 13:12:21 +01:00
wernsaar 2b801a00a5 small optimizations on sgemm_kernel for ARMV7 2013-11-02 13:06:11 +01:00
wernsaar b3eab8fcb7 minor optimizations on zgemm_kernel for ARMV7 2013-11-02 09:43:53 +01:00
wernsaar 02bc36ac79 added sgemm_ncopy routine and made some improvements on cgemm_kernel for ARMV7 2013-11-01 18:22:27 +01:00
wernsaar 5118a7f4d1 small optimizations on dgemm_kernel for Piledriver 2013-10-31 11:53:26 +01:00
wernsaar e172b70ea2 added cgemm_kernel for Piledriver 2013-10-31 08:38:17 +01:00
wernsaar 1cf4b974b2 added zgemm_kernel for Piledriver 2013-10-30 09:12:17 +01:00
wernsaar 7bccff1512 added sgemm_kernel for PILEDRIVER 2013-10-29 22:53:04 +01:00
wernsaar afe44b0241 tests and code cleanup of gemm_kernels for HASWELL 2013-10-28 14:23:48 +01:00
wernsaar a77c71eaf5 added highly optimized dgemm_kernel for HASWELL 2013-10-28 10:23:47 +01:00
wernsaar fe8c5666f9 optimized dgemm_kernel for HASWELL 2013-10-20 16:52:26 +02:00
wernsaar f6b50057e2 corrected and testet FMA3 Code 2013-10-19 10:52:20 +02:00
wernsaar 2840d56aeb added dgemm_kernel for Piledriver 2013-10-19 09:47:15 +02:00
wernsaar 85484a42df added kernels for cgemm, ctrmm, zgemm and ztrmm 2013-10-16 18:00:41 +02:00
wernsaar 3983011f0b added sgemm- and strmm_kernel 2013-10-14 08:22:27 +02:00
wernsaar 2a1515c9dd added dgemm_ncopy_4_vfpv3.S 2013-10-12 16:48:29 +02:00
wernsaar 31f51e78bc minor optimizations on dgemm_kernel 2013-10-12 09:42:18 +02:00
wangqian beffee7d91 Fixed buffer overflow bug in kernel/x86_64/dgemv_t.S file. 2013-10-11 03:20:20 +08:00
wernsaar e0b968c3a7 Changed kernels for dgemm and dtrmm 2013-10-05 12:59:44 +02:00
wernsaar 1c63180bb6 updated dgemm_kernel_8x2_vfpv3.S 2013-09-30 17:31:23 +02:00
wernsaar 4a474ea7dc changed dgemm_kernel to use fused multiply add 2013-09-29 17:46:23 +02:00
wernsaar 69ce737cc5 modified Makefile.L3 for ARM 2013-09-28 19:13:47 +02:00
wernsaar 70411af888 initial checkin of kernel/arm 2013-09-28 19:02:25 +02:00
Zhang Xianyi 6c4a7d0828 Import AMD Piledriver DGEMM kernel generated by AUGEM.
So far, this kernel doesn't deal with edge.

AUGEM: Automatically Generate High Performance Dense Linear Algebra
Kernels on x86 CPUs.
Qian Wang, Xianyi Zhang, Yunquan Zhang, and Qing Yi. In the
International Conference for High Performance Computing, Networking,
Storage and Analysis (SC'13). Denver, CO. Nov, 2013.
2013-08-25 10:16:01 -03:00
wernsaar 067e8417fd removed unnessesary instructions from zgemm_kernel_2x2_bulldozer.S 2013-08-23 22:22:43 +08:00
wernsaar a82da3d069 removed unnessesary instructions 2013-08-23 22:22:27 +08:00
Zhang Xianyi 1569bf14f8 Refs #282. Fixed zgemv_n typo bug on Win64. 2013-08-23 16:27:17 +08:00
Zhang Xianyi f51a849d91 Merge pull request #278 from wernsaar/haswell
Merge wernsaar's Haswell gemm kernels.
2013-08-17 08:24:37 -07:00
wernsaar 44ef70420c added cgemm_kernel_8x2_haswell.S 2013-08-16 18:54:56 +02:00
wernsaar d488b1b1aa added zgemm_kernel_4x2_haswell.S 2013-08-16 10:29:47 +02:00
wernsaar 4070d9a123 added dgemm_kernel_16x2_haswell.S 2013-08-15 19:17:20 +02:00
wernsaar 0b90c0ec64 added sgemm_kernel_16x4_haswell.S 2013-08-15 18:46:14 +02:00
wernsaar 2b8ab8f55b sgemm_kernel_16x4_haswell.S minor changes 2013-08-14 01:44:41 +02:00
wernsaar 1cb9579cd0 added zgemm_kernel_4x2_haswell.S and fixed a bug in sgemm_kernel_16x4_haswell.S 2013-08-14 01:23:15 +02:00
Zhang Xianyi 2638370844 Init code base for Intel Haswell. 2013-08-13 00:54:59 +08:00
wernsaar 89637f87c8 added sgemm- and dgemm-kernel for HASWELL processor 2013-08-12 18:04:10 +02:00
Zhang Xianyi c0159d44a3 Merge branch 'develop' of https://github.com/wernsaar/OpenBLAS into wernsaar-develop 2013-08-09 10:48:46 +08:00
wernsaar c17a850c1c modified KERNEL.BULLDOZER 2013-08-08 17:49:30 +02:00
wernsaar 099853fff6 added dtrsm_kernel_RN_8x2_bulldozer.S 2013-08-08 07:14:08 +02:00
wernsaar 44d23881b5 dtrsm_kernel_LT_8x2_bulldozer.S performance optimization 2013-08-05 11:27:16 +02:00
Zhang Xianyi 32fb6b9bb2 Merge branch 'develop' of https://github.com/wernsaar/OpenBLAS into wernsaar-develop 2013-08-05 16:09:47 +08:00
wernsaar aaeb8eaecd modified dtrsm_kernel_LT_8x2_bulldozer.S 2013-08-04 12:16:12 +02:00
wernsaar 8aeec32ea0 modified dtrsm_kernel_LT_8x2_bulldozer.S 2013-08-04 10:15:33 +02:00
wernsaar 87fc9de572 added dtrsm_kernel_LT_8x2_bulldozer.S 2013-08-04 09:54:40 +02:00
wernsaar 564aa60fec removed dtrsm_kernel_LT_8x2_bulldozer.S 2013-08-03 15:40:51 +02:00
wernsaar f645665dd6 fixed bug in dgemv_t_bulldozer.S 2013-08-03 12:19:29 +02:00
wernsaar e45a347cd2 repaired trmm bug in sgemm_kernel_16x2_bulldozer.S 2013-08-03 11:43:25 +02:00
wernsaar 99727ac013 repaired trmm bug in cgemm_kernel_4x2_bulldozer.S 2013-08-03 10:32:51 +02:00
wernsaar 6e0a2fbc0c repaired trmm bug in zgemm_kernel_2x2_bulldozer.S 2013-08-03 10:17:08 +02:00
wernsaar 0a22f99c58 repaired trmm bug in dgemm_kernel_8x2_bulldozer.S 2013-08-03 09:35:39 +02:00
wernsaar cff70a666d added generic trmm kernels and modified Makefile.L3 2013-07-30 20:18:57 +02:00