Commit Graph

674 Commits

Author SHA1 Message Date
Werner Saar f1a5dd06c5 added optimized sscal kernel for POWER8 2016-03-27 11:05:56 +02:00
wernsaar e125a3dc33 Merge pull request #824 from wernsaar/develop
added optimized drot-kernel and srot-kernel for POWER8
2016-03-27 10:43:17 +02:00
Werner Saar 35f1f21a7f added drot- and srot-kernel optimimized for POWER8 2016-03-27 08:57:11 +02:00
Zhang Xianyi 7b4b7179ba Merge pull request #819 from ashwinyes/develop_20160324_fixes_optimizations
Cortex-A57: Fixes and Optimizations
2016-03-27 00:04:20 -04:00
Werner Saar 3d9a50e841 added optimized sswap kernel for POWER8 2016-03-25 17:34:55 +01:00
Werner Saar 828c849b44 added optimized ccopy kernel for POWER8 2016-03-25 16:54:25 +01:00
Werner Saar ecc0bc9813 added optimized scopy kernel for POWER8 2016-03-25 16:06:56 +01:00
Werner Saar 12f209b7b0 added optimized zswap kernel for POWER8 2016-03-25 15:27:34 +01:00
Werner Saar 7316a87930 added optimized dswap kernel for POWER8 2016-03-25 14:35:43 +01:00
Werner Saar 0bff057a87 added optimized dcopy kernel for POWER8 2016-03-25 13:03:02 +01:00
Werner Saar 1e6cf9808c added optimized dscal kernel for POWER8 2016-03-25 09:42:08 +01:00
Ashwin Sekhar T K 278511ad2d Cortex-A57: Fix clang compilation errors 2016-03-24 10:42:04 +05:30
Ashwin Sekhar T K 3b5ffb49d3 Cortex-A57: Improve DGEMM 8x4 Implementation 2016-03-24 10:25:18 +05:30
Werner Saar 55eda3813b added optimized zaxpy kernel for POWER8 2016-03-23 11:20:23 +01:00
Werner Saar 0664ba4c97 added optimized daxpy kernel for POWER8 2016-03-22 14:50:03 +01:00
Werner Saar 11c44dede1 added optimized sdot kernel for POWER8 2016-03-21 13:18:23 +01:00
Werner Saar 9e4584d069 added optimized zdot kernel for POWER8 2016-03-21 10:12:07 +01:00
Werner Saar cd9fafc054 ddot for POWER8: updated licence information 2016-03-20 11:19:27 +01:00
Werner Saar 84b92e6373 added optimized ddot kernel for POWER8 2016-03-20 11:06:06 +01:00
wernsaar c279a53ed8 Merge pull request #806 from wernsaar/develop
adding optimized single precision blas level3 kernels for POWER8
2016-03-18 12:46:16 +01:00
Werner Saar e1df5a6e23 fixed sgemm- and strmm-kernel 2016-03-18 12:12:03 +01:00
Werner Saar 5c658f8746 add optimized cgemm- and ctrmm-kernel for POWER8 2016-03-18 08:17:25 +01:00
Ashwin Sekhar T K 5ac02f6dc7 Optimize Dgemm 4x4 for Cortex A57 2016-03-14 19:35:23 +05:30
Ashwin Sekhar T K 7aa1ad4923 Functional Assembly Kernels for CortexA57
Adding functional (non-optimized) kernels for Cortex-A57
with the following layouts.
SGEMM - 16x4, 8x8
CGEMM - 8x4
DGEMM - 8x4, 4x8
2016-03-14 19:33:21 +05:30
Werner Saar dcd15b546c BUGFIX: KERNEL.POWER8 2016-03-14 14:36:59 +01:00
Werner Saar 96284ab295 added sgemm- and strmm-kernel for POWER8 2016-03-14 13:52:44 +01:00
Werner Saar faa5e2e5e3 FIX: forgot the add the files cgemv_n_4.c and cgemv_t_4.c 2016-03-10 11:10:38 +01:00
Werner Saar fdf291be30 Added optimized cgemv_n and cgemv_t kernels for bulldozer, piledriver and steamroller 2016-03-10 09:42:07 +01:00
Werner Saar c99cc41cbd Added optimized zgemv_n kernel for bulldozer, piledriver and steamroller 2016-03-09 14:02:03 +01:00
Werner Saar acdff55a6a Bugfix for ztrmv 2016-03-07 09:39:34 +01:00
Zhang Xianyi 7d6b68eb4a Refs #786. Revert to default assembly kernel. 2016-03-07 11:34:58 +08:00
Werner Saar cd5241d0cf modified KERNEL for power, to use the generic DSDOT-KERNEL 2016-03-06 09:07:24 +01:00
Zhang Xianyi 8c43d7fa5f Merge remote-tracking branch 'origin/power8' into develop
Refs #774
2016-03-05 06:03:19 -05:00
Werner Saar 085f215257 Modified assembly label name, so that they are hidden.
Added license informations.
2016-03-05 10:27:27 +01:00
Zhang Xianyi 8f758eeff9 Refs #786. avoid old assembly c/zgemv kernels. 2016-03-05 08:32:03 +08:00
Werner Saar 0afc76fd65 enabled gemm_beta assembly kernels 2016-03-04 15:01:15 +01:00
Werner Saar 91e1c5080c modified configuration, to use power6 sgemm kernel for power8 2016-03-04 13:38:57 +01:00
Werner Saar 73f04c2c72 enabled hemv assemly function for power8 2016-03-04 13:20:50 +01:00
Werner Saar 3e633152c6 enabled symv assembly kernels on power8 2016-03-04 13:08:18 +01:00
Werner Saar d5130ce7e3 enabled gemv assembly on power8 2016-03-04 12:53:31 +01:00
Werner Saar 4824b88fcb enabled all level1 assembly kernels for power8 2016-03-04 12:35:25 +01:00
Werner Saar b752858d6c added dgemm-, dtrmm-, zgemm- and ztrmm-kernel for power8 2016-03-01 07:33:56 +01:00
Zhang Xianyi efa4f5c936 Refs #695 #783. Replace default x86_64 cgemv_t
asm kernel by C kernel.
2016-03-01 11:18:56 +08:00
Zhang Xianyi 74b0672223 Fix c/zaxpyc kernel bug on Cortex-A57. 2016-02-23 22:47:53 +00:00
Zhang Xianyi 6e7be06e07 Refs JuliaLang/julia#5728. Fix gemv performance bug on Haswell Mac OSX.
On Mac OS X, it should use .align 4 (equal to .align 16 on Linux).
I didn't get the performance benefit from .align. Thus, I deleted it.
2016-02-19 17:56:07 -05:00
Zhang Xianyi d06b92906a Add gemm3m building for CMake. 2016-02-12 05:02:51 +08:00
Zhang Xianyi 962376664d Refs #768. Swap the result of zdot x87 fp kernel. 2016-02-02 09:15:02 +08:00
Zhang Xianyi c44ff4d648 Refs #714. avoid compiling warnings. 2016-01-28 04:38:07 +08:00
Werner Saar 63a7d7fb24 updated gemv_n_vfpv3.S for armv7 2016-01-25 15:00:13 +01:00
Werner Saar b4ede558a5 updated nrm2 kernel for armv7 2016-01-25 11:55:25 +01:00