Commit Graph

700 Commits

Author SHA1 Message Date
Werner Saar 298b13bba4 updated some kernel files for EXCAVATOR 2016-04-25 10:36:23 +02:00
Werner Saar 78b05f6476 bugfix for EXCAVATOR and DYNAMIC_ARCH 2016-04-25 10:13:30 +02:00
Werner Saar a3da10662f added sgemm_tcopy_8_power8.S 2016-04-23 10:04:41 +02:00
Werner Saar d46f07bb4e added cgemm_tcopy_8_power8.S 2016-04-23 07:37:18 +02:00
Werner Saar 879a51165f Optimized zgemm and tested zgemm again 2016-04-22 13:07:12 +02:00
Werner Saar 9276c9012f Optimized sgemm and dgemm and tested again. 2016-04-21 11:37:57 +02:00
wernsaar 6fbca2a4a1 Merge pull request #845 from wernsaar/develop
optimized sgemm for power8
2016-04-20 13:44:22 +02:00
Werner Saar 0001260f4b optimized sgemm 2016-04-20 13:06:38 +02:00
Werner Saar 3c6294ca3d added optimized sgemm_tcopy for power8 2016-04-19 16:08:54 +02:00
Zhang Xianyi f24d5307cf Refs #834. Fix zgemv config bug on Steamroller. 2016-04-12 22:26:11 +08:00
Werner Saar 8037d78eed bugfix for arm scal.c and zscal.c 2016-04-11 11:21:36 +02:00
wernsaar 0a4276bc2f Merge pull request #837 from wernsaar/develop
updated zgemm- and ztrmm-kernel for POWER8
2016-04-08 11:13:27 +02:00
Werner Saar e173c51c04 updated zgemm- and ztrmm-kernel for POWER8 2016-04-08 09:05:37 +02:00
Werner Saar 9c42f0374a Updated cgemm- and sgemm-kernel for POWER8 SMP 2016-04-07 15:08:15 +02:00
Zhang Xianyi d4380c1fe4 Refs xianyi/OpenBLAS-CI#10 , Fix sdot for scipy test_iterative.test_convergence test failure on AMD bulldozer and piledriver. 2016-04-07 01:44:18 +08:00
Werner Saar a51102e9b7 bugfixes for sgemm- and cgemm-kernel 2016-04-06 11:15:21 +02:00
Werner Saar c5b1fbcb2e updated optimized cgemm- and ctrmm-kernel for POWER8 2016-04-04 09:12:08 +02:00
Werner Saar d4c0330967 updated cgemm- and ctrmm-kernel for POWER8 2016-04-03 14:30:49 +02:00
Werner Saar 6a9bbfc227 updated sgemm- and strmm-kernel for POWER8 2016-04-02 17:16:36 +02:00
Werner Saar 68a69c5b50 added optimized dgemv_n kernel for POWER8 2016-03-30 11:10:53 +02:00
Werner Saar c2464a7c4a added optimized casum kernel for POWER8 2016-03-28 14:12:08 +02:00
Werner Saar 294f933869 added optimized zasum kernel for POWER8 2016-03-28 13:37:32 +02:00
Werner Saar f59c9bd6ef added optimized sasum kernel for POWER8 2016-03-28 12:44:25 +02:00
Werner Saar c53be46d78 added optimized dasum kernel for POWER8 2016-03-28 12:17:15 +02:00
Werner Saar 659ed16591 added otimized cswap and zswap kernels for POWER8 2016-03-27 18:31:37 +02:00
Werner Saar 35c98a3556 added optimized zscal kernel for POWER8 2016-03-27 16:31:50 +02:00
Werner Saar f1a5dd06c5 added optimized sscal kernel for POWER8 2016-03-27 11:05:56 +02:00
wernsaar e125a3dc33 Merge pull request #824 from wernsaar/develop
added optimized drot-kernel and srot-kernel for POWER8
2016-03-27 10:43:17 +02:00
Werner Saar 35f1f21a7f added drot- and srot-kernel optimimized for POWER8 2016-03-27 08:57:11 +02:00
Zhang Xianyi 7b4b7179ba Merge pull request #819 from ashwinyes/develop_20160324_fixes_optimizations
Cortex-A57: Fixes and Optimizations
2016-03-27 00:04:20 -04:00
Werner Saar 3d9a50e841 added optimized sswap kernel for POWER8 2016-03-25 17:34:55 +01:00
Werner Saar 828c849b44 added optimized ccopy kernel for POWER8 2016-03-25 16:54:25 +01:00
Werner Saar ecc0bc9813 added optimized scopy kernel for POWER8 2016-03-25 16:06:56 +01:00
Werner Saar 12f209b7b0 added optimized zswap kernel for POWER8 2016-03-25 15:27:34 +01:00
Werner Saar 7316a87930 added optimized dswap kernel for POWER8 2016-03-25 14:35:43 +01:00
Werner Saar 0bff057a87 added optimized dcopy kernel for POWER8 2016-03-25 13:03:02 +01:00
Werner Saar 1e6cf9808c added optimized dscal kernel for POWER8 2016-03-25 09:42:08 +01:00
Ashwin Sekhar T K 278511ad2d Cortex-A57: Fix clang compilation errors 2016-03-24 10:42:04 +05:30
Ashwin Sekhar T K 3b5ffb49d3 Cortex-A57: Improve DGEMM 8x4 Implementation 2016-03-24 10:25:18 +05:30
Werner Saar 55eda3813b added optimized zaxpy kernel for POWER8 2016-03-23 11:20:23 +01:00
Werner Saar 0664ba4c97 added optimized daxpy kernel for POWER8 2016-03-22 14:50:03 +01:00
Werner Saar 11c44dede1 added optimized sdot kernel for POWER8 2016-03-21 13:18:23 +01:00
Werner Saar 9e4584d069 added optimized zdot kernel for POWER8 2016-03-21 10:12:07 +01:00
Werner Saar cd9fafc054 ddot for POWER8: updated licence information 2016-03-20 11:19:27 +01:00
Werner Saar 84b92e6373 added optimized ddot kernel for POWER8 2016-03-20 11:06:06 +01:00
wernsaar c279a53ed8 Merge pull request #806 from wernsaar/develop
adding optimized single precision blas level3 kernels for POWER8
2016-03-18 12:46:16 +01:00
Werner Saar e1df5a6e23 fixed sgemm- and strmm-kernel 2016-03-18 12:12:03 +01:00
Werner Saar 5c658f8746 add optimized cgemm- and ctrmm-kernel for POWER8 2016-03-18 08:17:25 +01:00
Ashwin Sekhar T K 5ac02f6dc7 Optimize Dgemm 4x4 for Cortex A57 2016-03-14 19:35:23 +05:30
Ashwin Sekhar T K 7aa1ad4923 Functional Assembly Kernels for CortexA57
Adding functional (non-optimized) kernels for Cortex-A57
with the following layouts.
SGEMM - 16x4, 8x8
CGEMM - 8x4
DGEMM - 8x4, 4x8
2016-03-14 19:33:21 +05:30