Commit Graph

488 Commits

Author SHA1 Message Date
wernsaar 3c5732615d added blocked sgemv_n and microkernel for bulldozer and piledriver 2014-07-17 23:15:07 +02:00
wernsaar 880597b301 segment violation in sgemv kernels 2014-07-13 10:46:14 +02:00
wernsaar 0884b73c69 Lapack-test Windows 32bit now error free 2014-07-10 11:01:47 +02:00
wernsaar 9bd9472ae9 Lapack-test: cleanup of x86 32bit KERNEL file 2014-07-09 16:08:19 +02:00
wernsaar c4a423a642 bugfixes for lapack on ARM Platform 2014-07-09 12:21:39 +02:00
wernsaar 13348b2137 removed reference to daxpy_bulldozer kernel (Windows bug in lapack-test) 2014-07-06 16:39:32 +02:00
wernsaar 9964ed2f79 bugfix for CORE2 2014-07-06 11:47:28 +02:00
wernsaar d5b976f92d fallback to zgemm_kernel_4x2_sse.S 2014-07-06 11:05:28 +02:00
wernsaar f7267d9b0e added missing definition for DUNNINGTON 2014-07-06 10:17:07 +02:00
wernsaar e0c080a28c removed reference to zgemm_kernel_4x2_sse3.S (bug in lapack-test) 2014-07-05 16:13:17 +02:00
wernsaar e80b144932 enabled compiling of *3M functions 2014-07-02 14:11:53 +02:00
wernsaar be94db096c disabled *3M functions for x86_64 platforms 2014-07-01 16:18:05 +02:00
wernsaar b079df9ef4 added optimized sdot- and dsdot-kernel, written in C 2014-06-30 14:46:38 +02:00
wernsaar 01a119abfc enabled SMP for sbmv and zsbmv, but only for 64bit binaries 2014-06-29 20:35:56 +02:00
Zhang Xianyi 99efbbbad5 Fixed #395. Enable optimized cgemm for Sandybridge. Added optimized sdot kernel.
Fixed c/zgemm, zgemv computational error of haswell, piledriver, bullldozer, and
barcelona on Windows.

Merge branch 'develop' of https://github.com/wernsaar/OpenBLAS into wernsaar-develop

Conflicts:
	kernel/Makefile.L1
	kernel/x86_64/KERNEL
	param.h
2014-06-29 10:34:51 +08:00
wernsaar 22e5aee2dd fixed zgemv bug for older AMD Processors 2014-06-28 19:04:49 +02:00
wernsaar 35d37e124f bugfix for barcelona zgemv-kernel 2014-06-28 12:36:11 +02:00
wernsaar d8ba46efdb bugfix for bulldozer cgemm-, zgemm- and zgemv-kernel 2014-06-28 12:16:20 +02:00
wernsaar a15f22a1f6 bugfix for piledriver cgemm-, zgemm- and zgemv-kernel 2014-06-28 11:46:58 +02:00
wernsaar b94ea89f52 bugfix for haswell cgemm- and zgemm-kernel 2014-06-28 10:22:40 +02:00
wernsaar 35f668bb14 bugfix for cgemm_kernel_8x2_sandy.S 2014-06-28 10:01:56 +02:00
Timothy Gu 6c2ead30f0 Remove all trailing whitespace except lapack-netlib
Signed-off-by: Timothy Gu <timothygu99@gmail.com>
2014-06-27 12:05:18 -07:00
wernsaar 365e8de346 added optimized cgemm-kernel for SANDYBRIDGE 2014-06-27 13:40:29 +02:00
wernsaar 578d1b6219 added DSDOT definition and enabled optimized sdot kernel 2014-06-27 11:30:29 +02:00
wernsaar dabab2b5f4 added new optimized sgemm kernel for SANDYBRIGE 2014-06-26 21:42:08 +02:00
wernsaar aa2709c4e0 enabled optimized dgemm kernel for NEHALEM 2014-06-26 12:22:29 +02:00
wernsaar a13bcc1716 enabled optimized sgemv kernel for barcelona and piledriver 2014-06-25 13:50:57 +02:00
wernsaar d2c82d7543 enabled optimized sgemv kernel for HASWELL 2014-06-25 12:56:45 +02:00
wernsaar 0517672dd0 enabled optimized sgemv kernels for nehalem, sandybridge and bulldozer 2014-06-25 12:38:14 +02:00
wernsaar 23203d52c1 Ref #380: lowered stack usage for haswell kernels 2014-06-19 14:31:52 +02:00
wernsaar 73545a79cd Ref #380: lowered stack usage for piledriver and bulldozer kernels 2014-06-19 14:02:14 +02:00
wernsaar ff9cfca24c Ref #385: added missing return instruction 2014-06-12 15:52:14 +02:00
wernsaar cee257f384 Ref #51: added blas extensions zomatcopy and comatcopy 2014-06-10 10:34:54 +02:00
wernsaar 7bfb3011e8 Ref #51: added blas extension somatcopy 2014-06-09 20:21:13 +02:00
wernsaar 8c8f596238 Ref #51: added blas extension domatcopy as not opimized reference 2014-06-09 17:11:07 +02:00
wernsaar faf3ac0aad Ref #285: added axpby kernels 2014-06-08 11:54:24 +02:00
Zhang Xianyi 406f5bd22b Merge branch 'develop' of https://github.com/wernsaar/OpenBLAS into wernsaar-develop
Conflicts:
	kernel/arm/KERNEL.ARMV6
2014-05-21 11:24:39 +08:00
wernsaar aaddb05411 bugfix for ARMV6 2014-05-17 13:00:36 +02:00
wernsaar e826a5a6af some modifications regarding lapack test 2014-05-16 20:37:41 +02:00
wernsaar c38379c9dd bugfixes for ARM regarding lapack tests 2014-05-14 13:03:45 +02:00
wernsaar a0b07c1440 bugfixs for ARM regarding lapack tests 2014-05-14 12:59:20 +02:00
wernsaar 43fbdb7a5a added ARMV5 as reference platform 2014-05-13 17:25:19 +02:00
wernsaar 777cebc8c7 added ZERO check to zscal.c because bug in lapack-testing 2014-05-13 16:31:00 +02:00
wernsaar aa5c73e20f added ZERO check to zscal.c because bug in lapack-test 2014-05-13 16:25:21 +02:00
wernsaar 5e5ef28ca0 added ZERO check because bug in lapack-test 2014-05-13 15:36:03 +02:00
wernsaar 650ed34336 added ZERO check because bug in lapack-test 2014-05-13 15:31:36 +02:00
wernsaar 5f3b68b4d4 replaced sgemm and cgemm kernels because lapack bugs 2014-05-10 11:24:07 +02:00
wernsaar 2424af62fd replaced dgemm-kernel because bug in lapack 2014-05-10 10:52:37 +02:00
wernsaar 793509a3b5 replaced files for sdot, sgemv_n and sgemv_t for bug #348 2014-05-06 15:29:39 +02:00
wernsaar 47b22763f8 reduced stack usage on windows to 16K 2014-04-24 14:09:26 +02:00
wernsaar 9db0fb8b02 bugfix for sdsdot 2014-02-28 14:59:36 +01:00
wernsaar f9daebba0a checked in bugfixes for ARM 2014-02-16 11:45:47 +01:00
Zhang Xianyi 9a557e90da Refs #340. Fixed SEGFAULT bug of dgemv_n on OSX. 2014-02-15 23:23:15 +08:00
wangqian 2d557eb1e0 Fixed computational error of dgemv_n. 2014-02-04 21:47:51 +08:00
Zhang Xianyi 05bb391c3a Refs #330. Fixed the compatible issue with clang on Mac OSX. 2013-12-16 20:31:17 +08:00
Zhang Xianyi 9b5be29886 Refs #310. Fixed Segfault bug on nehalem when Julia calling dgeqrt3 on OSX.
Please also check JuliaLang/julia#4099
Julia test script:
  A=rand(256, 256)
  qrfact(A)

I found this was a bug in kernel/x86_64/dgemm_ncopy_8.S.
However, I cannot use gdb with julia. Thus, this is a walkaround fix.
2013-12-12 23:23:04 +08:00
wernsaar 53eaf41901 added support for HASWELL 2013-12-02 13:17:51 +01:00
wernsaar 9423f980f6 modified trsm kernel 2013-12-02 10:08:14 +01:00
wernsaar c6156b2ef2 added trsm kernels from origin 2013-12-01 22:39:39 +01:00
wernsaar 034a5b2083 modified zsymv 2013-12-01 21:07:49 +01:00
wernsaar 27d4234d4d merged symv 2013-12-01 20:56:02 +01:00
wernsaar 402d6e91db Merge remote branch 'origin/develop' into armv7 2013-12-01 18:18:40 +01:00
wernsaar b3254eecaf Merge remote branch 'origin/haswell' into develop 2013-12-01 18:09:12 +01:00
wernsaar d910404f00 Merge remote branch 'origin/piledriver' into develop 2013-12-01 18:06:51 +01:00
wernsaar ffe70b1fdc modified Makefile.L3 2013-12-01 17:58:46 +01:00
wernsaar 0b6e13b689 Merge remote branch 'origin/develop' into haswell 2013-12-01 13:38:11 +01:00
wernsaar e09dc279a2 Merge remote branch 'origin/develop' into piledriver 2013-12-01 13:33:18 +01:00
wernsaar 4be4db590c Merge remote branch 'origin/develop' into armv7 2013-12-01 13:16:41 +01:00
wernsaar 5c648a8984 Merge remote branch 'origin/develop' into haswell 2013-12-01 11:25:33 +01:00
wernsaar c44dc4dd3c Merge remote branch 'origin/develop' into piledriver 2013-12-01 11:06:36 +01:00
wernsaar 9d3fae15a8 Merge branch 'develop' into armv7 2013-12-01 10:12:07 +01:00
wernsaar 2d3c884294 added complex gemv kernels for ARMV6 and ARMV7 2013-11-29 17:06:33 +01:00
wernsaar d54a061713 optimized gemv_n_vfp.S 2013-11-28 17:40:21 +01:00
wernsaar 86afb47e83 added optimized ctrmm kernel for ARMV6 2013-11-28 14:35:07 +01:00
wernsaar 42a4dff056 added optimized ztrmm kernel for ARMV6 2013-11-28 13:41:06 +01:00
wernsaar 5bc322a66c optimized strmm kernel for ARMV6 2013-11-28 12:45:38 +01:00
wernsaar dec7ad0dfd optimized dtrmm kernel for ARMV7 2013-11-28 12:32:12 +01:00
wernsaar 274304bd03 add optimized cgemm kernel for ARMV6 2013-11-28 11:54:38 +01:00
wernsaar 5007a534c4 optimized zgemm kernel for ARMV6 2013-11-28 10:04:43 +01:00
wernsaar a537d7d8d7 optimized zgemm_kernel_2x2_vfp.S 2013-11-28 08:33:44 +01:00
wernsaar b42145834f optimized sgemm kernel for ARMV6 2013-11-28 08:08:08 +01:00
wernsaar 3d5e792c72 optimized sgemm kernel for ARMV6 2013-11-27 18:38:32 +01:00
wernsaar a9bd12da2c optimized dgemm kernel for ARMV6 2013-11-27 17:37:38 +01:00
wernsaar 697e198e8a added zgemm_kernel for ARMV6 2013-11-27 16:15:06 +01:00
wernsaar 36b0f7fe1d added optimized gemv_t kernel for ARMV6 2013-11-25 19:31:27 +01:00
wernsaar d2b20c5c51 add optimized axpy kernel 2013-11-25 12:25:58 +01:00
wernsaar fe5f46c330 added experimental support for ARMV8 2013-11-24 15:47:00 +01:00
wernsaar 25c6050593 add single and double precision gemv_n kernel for ARMV6 2013-11-24 12:03:28 +01:00
wernsaar 12e02a00e0 added ncopy kernels for ARMV6 2013-11-24 08:46:47 +01:00
wernsaar 29a3196f56 added optimized sgemm and strmm kernel for ARMV6 2013-11-23 18:09:41 +01:00
wernsaar 8776a73773 added optimized dgemm and dtrmm kernel for ARMV6 2013-11-23 16:24:52 +01:00
wernsaar 7e84acd3e8 fixed bug in SAVE macros, that are not found by any test routine 2013-11-23 14:35:19 +01:00
wernsaar 33d3ab6e09 small optimizations for zgemv kernels 2013-11-23 12:35:31 +01:00
wernsaar 9a0f978929 added nrm2 kernel for ARMV6 2013-11-22 17:21:10 +01:00
wernsaar 7f210587f0 renamed some ncopy and tcopy files 2013-11-22 00:20:25 +01:00
wernsaar 9f0a3a35b3 removed obsolete file sdot_vfpv3.S 2013-11-21 23:42:54 +01:00
wernsaar dbae93110b added sdot_vfp.S 2013-11-21 23:34:51 +01:00
wernsaar 19cd5c64a2 renamed swap_vfpv3.S to swap_vfp.S 2013-11-21 23:19:32 +01:00
wernsaar 9adf87495e renamed some dot kernels 2013-11-21 23:07:51 +01:00
wernsaar 440db4cdda delete rot_vfpv3.S 2013-11-21 22:52:24 +01:00
wernsaar cd93cae5a7 renamed rot_vfpv3.S to rot_vfp.S 2013-11-21 22:49:28 +01:00
wernsaar 8565afb3c2 renamed asum_vfpv3.S to asum_vfp.S 2013-11-21 22:26:27 +01:00
wernsaar 5bf7cf8d67 renamed scal_vfpv3.S to scal_vfp.S 2013-11-21 22:03:36 +01:00
wernsaar 29a005c635 renamed iamax assembler kernel 2013-11-21 21:12:33 +01:00
wernsaar f1be3a168a renamed some BLAS kernels, which are compatible to ARMV6 2013-11-21 20:48:57 +01:00
wernsaar 410afda9b4 added cpu detection and target ARMV6, used in raspberry pi 2013-11-21 20:18:51 +01:00
wernsaar bf04544902 added gemv_n kernel for single and double precision 2013-11-19 15:07:20 +01:00
wernsaar 86283c0be1 added gemv_t kernel for single and double precision 2013-11-19 09:55:54 +01:00
wernsaar f27cabfd08 added nrm2 kernel for all precisions 2013-11-16 16:17:17 +01:00
wernsaar 23dd474cd0 added rot kernel for all precisions 2013-11-15 14:08:57 +01:00
wernsaar f1b452e160 added scal kernel for all precisions 2013-11-15 11:56:43 +01:00
wernsaar 3dabd7e6e6 added swap-kernel for all precisions 2013-11-14 19:06:19 +01:00
wernsaar 6f4a0ebe38 added max- und min-kernels for all precisions 2013-11-14 13:52:47 +01:00
wernsaar f1db386211 changes for compatibility with Pathscale compiler 2013-11-13 17:59:11 +01:00
wernsaar 6da558d2ab changes for compatibility with Pathscale compiler 2013-11-13 17:39:13 +01:00
wernsaar f750103336 small optimizations on dot-kernels 2013-11-11 15:47:56 +01:00
wernsaar 00f33c0134 added asum_kernel for all precisions and complex 2013-11-11 14:20:59 +01:00
wernsaar 5b36cc0f47 added blas level1 dot kernels for complex and double complex 2013-11-08 09:08:11 +01:00
wernsaar c8f1aeb154 added optimized blas level1 dot kernels for single and double precision 2013-11-07 17:22:03 +01:00
wernsaar 8fa93be06e added optimized blas level1 copy kernels 2013-11-07 17:18:56 +01:00
wernsaar 1e8128f41c added cgemm_tcopy_2_vfpv3.S and zgemm_tcopy_2_vfpv3.S 2013-11-07 17:15:50 +01:00
Zhang Xianyi 2f5fdd2000 Refs #314. Fixed clang compiling bug on OSX. 2013-11-07 08:12:03 +08:00
wernsaar 80a2e901b1 added dgemm_tcopy_4_vfpv3.S and sgemm_tcopy_4_vfpv3.S 2013-11-06 20:01:18 +01:00
wernsaar ac50bccbd2 added cgemm_ncopy_2_vfpv3.S and made assembler labels unique 2013-11-05 20:21:35 +01:00
wernsaar 82015beaef added zgemm_ncopy_2_vfpv3.S and made assembler labels unique 2013-11-05 19:31:22 +01:00
wernsaar 6216ab8a7e removed obsolete gemm_kernels from haswell branch 2013-11-04 08:33:04 +01:00
wernsaar 370e3834a9 added missing file kernel/arm/Makefile 2013-11-03 11:54:39 +01:00
wernsaar e31186efd4 deleted obsolete dgemm_kernel and dtrmm_kernel 2013-11-02 13:12:21 +01:00
wernsaar 2b801a00a5 small optimizations on sgemm_kernel for ARMV7 2013-11-02 13:06:11 +01:00
wernsaar b3eab8fcb7 minor optimizations on zgemm_kernel for ARMV7 2013-11-02 09:43:53 +01:00
wernsaar 02bc36ac79 added sgemm_ncopy routine and made some improvements on cgemm_kernel for ARMV7 2013-11-01 18:22:27 +01:00
wernsaar 5118a7f4d1 small optimizations on dgemm_kernel for Piledriver 2013-10-31 11:53:26 +01:00
wernsaar e172b70ea2 added cgemm_kernel for Piledriver 2013-10-31 08:38:17 +01:00
wernsaar 1cf4b974b2 added zgemm_kernel for Piledriver 2013-10-30 09:12:17 +01:00
wernsaar 7bccff1512 added sgemm_kernel for PILEDRIVER 2013-10-29 22:53:04 +01:00
wernsaar afe44b0241 tests and code cleanup of gemm_kernels for HASWELL 2013-10-28 14:23:48 +01:00
wernsaar a77c71eaf5 added highly optimized dgemm_kernel for HASWELL 2013-10-28 10:23:47 +01:00
wernsaar fe8c5666f9 optimized dgemm_kernel for HASWELL 2013-10-20 16:52:26 +02:00
wernsaar f6b50057e2 corrected and testet FMA3 Code 2013-10-19 10:52:20 +02:00
wernsaar 2840d56aeb added dgemm_kernel for Piledriver 2013-10-19 09:47:15 +02:00
wernsaar 85484a42df added kernels for cgemm, ctrmm, zgemm and ztrmm 2013-10-16 18:00:41 +02:00
wernsaar 3983011f0b added sgemm- and strmm_kernel 2013-10-14 08:22:27 +02:00
wernsaar 2a1515c9dd added dgemm_ncopy_4_vfpv3.S 2013-10-12 16:48:29 +02:00
wernsaar 31f51e78bc minor optimizations on dgemm_kernel 2013-10-12 09:42:18 +02:00
wangqian beffee7d91 Fixed buffer overflow bug in kernel/x86_64/dgemv_t.S file. 2013-10-11 03:20:20 +08:00
wernsaar e0b968c3a7 Changed kernels for dgemm and dtrmm 2013-10-05 12:59:44 +02:00
wernsaar 1c63180bb6 updated dgemm_kernel_8x2_vfpv3.S 2013-09-30 17:31:23 +02:00
wernsaar 4a474ea7dc changed dgemm_kernel to use fused multiply add 2013-09-29 17:46:23 +02:00
wernsaar 69ce737cc5 modified Makefile.L3 for ARM 2013-09-28 19:13:47 +02:00
wernsaar 70411af888 initial checkin of kernel/arm 2013-09-28 19:02:25 +02:00
Zhang Xianyi 6c4a7d0828 Import AMD Piledriver DGEMM kernel generated by AUGEM.
So far, this kernel doesn't deal with edge.

AUGEM: Automatically Generate High Performance Dense Linear Algebra
Kernels on x86 CPUs.
Qian Wang, Xianyi Zhang, Yunquan Zhang, and Qing Yi. In the
International Conference for High Performance Computing, Networking,
Storage and Analysis (SC'13). Denver, CO. Nov, 2013.
2013-08-25 10:16:01 -03:00
wernsaar 067e8417fd removed unnessesary instructions from zgemm_kernel_2x2_bulldozer.S 2013-08-23 22:22:43 +08:00
wernsaar a82da3d069 removed unnessesary instructions 2013-08-23 22:22:27 +08:00
Zhang Xianyi 1569bf14f8 Refs #282. Fixed zgemv_n typo bug on Win64. 2013-08-23 16:27:17 +08:00
Zhang Xianyi f51a849d91 Merge pull request #278 from wernsaar/haswell
Merge wernsaar's Haswell gemm kernels.
2013-08-17 08:24:37 -07:00
wernsaar 44ef70420c added cgemm_kernel_8x2_haswell.S 2013-08-16 18:54:56 +02:00
wernsaar d488b1b1aa added zgemm_kernel_4x2_haswell.S 2013-08-16 10:29:47 +02:00
wernsaar 4070d9a123 added dgemm_kernel_16x2_haswell.S 2013-08-15 19:17:20 +02:00
wernsaar 0b90c0ec64 added sgemm_kernel_16x4_haswell.S 2013-08-15 18:46:14 +02:00
wernsaar 2b8ab8f55b sgemm_kernel_16x4_haswell.S minor changes 2013-08-14 01:44:41 +02:00
wernsaar 1cb9579cd0 added zgemm_kernel_4x2_haswell.S and fixed a bug in sgemm_kernel_16x4_haswell.S 2013-08-14 01:23:15 +02:00
Zhang Xianyi 2638370844 Init code base for Intel Haswell. 2013-08-13 00:54:59 +08:00
wernsaar 89637f87c8 added sgemm- and dgemm-kernel for HASWELL processor 2013-08-12 18:04:10 +02:00
Zhang Xianyi c0159d44a3 Merge branch 'develop' of https://github.com/wernsaar/OpenBLAS into wernsaar-develop 2013-08-09 10:48:46 +08:00
wernsaar c17a850c1c modified KERNEL.BULLDOZER 2013-08-08 17:49:30 +02:00
wernsaar 099853fff6 added dtrsm_kernel_RN_8x2_bulldozer.S 2013-08-08 07:14:08 +02:00
wernsaar 44d23881b5 dtrsm_kernel_LT_8x2_bulldozer.S performance optimization 2013-08-05 11:27:16 +02:00
Zhang Xianyi 32fb6b9bb2 Merge branch 'develop' of https://github.com/wernsaar/OpenBLAS into wernsaar-develop 2013-08-05 16:09:47 +08:00
wernsaar aaeb8eaecd modified dtrsm_kernel_LT_8x2_bulldozer.S 2013-08-04 12:16:12 +02:00
wernsaar 8aeec32ea0 modified dtrsm_kernel_LT_8x2_bulldozer.S 2013-08-04 10:15:33 +02:00
wernsaar 87fc9de572 added dtrsm_kernel_LT_8x2_bulldozer.S 2013-08-04 09:54:40 +02:00
wernsaar 564aa60fec removed dtrsm_kernel_LT_8x2_bulldozer.S 2013-08-03 15:40:51 +02:00
wernsaar f645665dd6 fixed bug in dgemv_t_bulldozer.S 2013-08-03 12:19:29 +02:00
wernsaar e45a347cd2 repaired trmm bug in sgemm_kernel_16x2_bulldozer.S 2013-08-03 11:43:25 +02:00
wernsaar 99727ac013 repaired trmm bug in cgemm_kernel_4x2_bulldozer.S 2013-08-03 10:32:51 +02:00
wernsaar 6e0a2fbc0c repaired trmm bug in zgemm_kernel_2x2_bulldozer.S 2013-08-03 10:17:08 +02:00
wernsaar 0a22f99c58 repaired trmm bug in dgemm_kernel_8x2_bulldozer.S 2013-08-03 09:35:39 +02:00
wernsaar cff70a666d added generic trmm kernels and modified Makefile.L3 2013-07-30 20:18:57 +02:00
wernsaar 84bd0aabaa added dtrsm_kernel_LT_8x2_bulldozer.S 2013-07-28 16:47:58 +02:00
Zhang Xianyi 72b1edaf1b Merge branch 'develop' into bulldozer
Conflicts:
	kernel/x86_64/KERNEL.BULLDOZER
2013-07-28 06:38:25 +02:00
wangqian 1b3b9e841d Fixed a computational error in zgemm_kernel_4x4_sandy.S file. 2013-07-18 20:23:21 +08:00
Zhang Xianyi 2ed0f6ab60 Fixed the typo. 2013-07-11 23:47:07 +08:00
Zhang Xianyi 886cbaf4e4 Support AMD Piledriver by bulldozer kernels. 2013-07-06 12:06:43 -03:00
Zhang Xianyi 57944538b6 Use ALIGN_5 instead of .algin 32 in assembly kernel. Added ALIGN_5 for 32-bit OSX. 2013-07-01 16:09:05 +08:00
Zhang Xianyi fa916a0fac Fixed #238 bug in lsame on x86. 2013-06-28 22:43:41 +08:00
Zhang Xianyi fb298b34ae Merge pull request #235 from wernsaar/develop
Added ddot, daxpy, dcopy kernels for AMD bulldozer.
2013-06-21 17:59:26 -07:00
wernsaar 16012767f4 added dcopy_bulldozer.S 2013-06-21 16:06:51 +02:00
wernsaar bcbac31b47 added ddot_bulldozer.S 2013-06-20 16:15:09 +02:00
wernsaar 8dc0c72583 added daxpy_bulldozer.S 2013-06-20 14:07:54 +02:00
wernsaar 89405a1a0b cleanup of dgemm_ncopy_8_bulldozer.S 2013-06-19 19:31:38 +02:00
wernsaar 4f2b12b8a8 added dgemv_t_bulldozer.S 2013-06-19 17:32:42 +02:00
Zhang Xianyi 646e168d26 Merge pull request #233 from wernsaar/develop
added dgemv_n and some faster gemm_copy routines to BULLDOZER.
2013-06-18 20:02:36 -07:00
wernsaar 93dbbe1fb8 added dgemm_ncopy_8_bulldozer.S 2013-06-18 13:29:23 +02:00
wernsaar a135f5d9ed added gemm_tcopy_2_bulldozer.S 2013-06-18 11:01:33 +02:00
wernsaar d0b6299b13 added dgemm_tcopy_8_bulldozer.S 2013-06-17 14:19:09 +02:00
wernsaar 9e58dd509e added gemm_ncopy_2_bulldozer.S 2013-06-17 12:55:12 +02:00
wernsaar 7c8227101b cleanup of dgemv_n_bulldozer.S and optimization of inner loop 2013-06-16 12:50:45 +02:00
wernsaar f67fa62851 added dgemv_n_bulldozer.S 2013-06-15 16:42:37 +02:00
Zhang Xianyi cd1d473ba0 Merge pull request #230 from wernsaar/develop
Refs #230. New dgemm and sgemm Kernel for BULLDOZER
2013-06-13 07:29:27 -07:00
wernsaar 0ded1fcc1c performance optimizations in sgemm_kernel_16x2_bulldozer.S 2013-06-13 11:35:15 +02:00