Werner Saar
|
a3da10662f
|
added sgemm_tcopy_8_power8.S
|
2016-04-23 10:04:41 +02:00 |
Werner Saar
|
d46f07bb4e
|
added cgemm_tcopy_8_power8.S
|
2016-04-23 07:37:18 +02:00 |
Werner Saar
|
879a51165f
|
Optimized zgemm and tested zgemm again
|
2016-04-22 13:07:12 +02:00 |
Shivraj Patil
|
2c3dfe2bf3
|
MIPS P5600(32 bit) and I6400(64 bit) cores support added.
Seperated mips and mips64 files.
Configurations support for mips 32 bit.
Signed-off-by: Shivraj Patil <shivraj.patil@imgtec.com>
|
2016-04-22 14:03:18 +05:30 |
Werner Saar
|
9276c9012f
|
Optimized sgemm and dgemm and tested again.
|
2016-04-21 11:37:57 +02:00 |
wernsaar
|
6fbca2a4a1
|
Merge pull request #845 from wernsaar/develop
optimized sgemm for power8
|
2016-04-20 13:44:22 +02:00 |
Werner Saar
|
0001260f4b
|
optimized sgemm
|
2016-04-20 13:06:38 +02:00 |
Werner Saar
|
3c6294ca3d
|
added optimized sgemm_tcopy for power8
|
2016-04-19 16:08:54 +02:00 |
Zhang Xianyi
|
f24d5307cf
|
Refs #834. Fix zgemv config bug on Steamroller.
|
2016-04-12 22:26:11 +08:00 |
Werner Saar
|
8037d78eed
|
bugfix for arm scal.c and zscal.c
|
2016-04-11 11:21:36 +02:00 |
wernsaar
|
0a4276bc2f
|
Merge pull request #837 from wernsaar/develop
updated zgemm- and ztrmm-kernel for POWER8
|
2016-04-08 11:13:27 +02:00 |
Werner Saar
|
e173c51c04
|
updated zgemm- and ztrmm-kernel for POWER8
|
2016-04-08 09:05:37 +02:00 |
Werner Saar
|
9c42f0374a
|
Updated cgemm- and sgemm-kernel for POWER8 SMP
|
2016-04-07 15:08:15 +02:00 |
Zhang Xianyi
|
d4380c1fe4
|
Refs xianyi/OpenBLAS-CI#10 , Fix sdot for scipy test_iterative.test_convergence test failure on AMD bulldozer and piledriver.
|
2016-04-07 01:44:18 +08:00 |
Werner Saar
|
a51102e9b7
|
bugfixes for sgemm- and cgemm-kernel
|
2016-04-06 11:15:21 +02:00 |
Werner Saar
|
c5b1fbcb2e
|
updated optimized cgemm- and ctrmm-kernel for POWER8
|
2016-04-04 09:12:08 +02:00 |
Werner Saar
|
d4c0330967
|
updated cgemm- and ctrmm-kernel for POWER8
|
2016-04-03 14:30:49 +02:00 |
Werner Saar
|
6a9bbfc227
|
updated sgemm- and strmm-kernel for POWER8
|
2016-04-02 17:16:36 +02:00 |
Werner Saar
|
68a69c5b50
|
added optimized dgemv_n kernel for POWER8
|
2016-03-30 11:10:53 +02:00 |
Werner Saar
|
c2464a7c4a
|
added optimized casum kernel for POWER8
|
2016-03-28 14:12:08 +02:00 |
Werner Saar
|
294f933869
|
added optimized zasum kernel for POWER8
|
2016-03-28 13:37:32 +02:00 |
Werner Saar
|
f59c9bd6ef
|
added optimized sasum kernel for POWER8
|
2016-03-28 12:44:25 +02:00 |
Werner Saar
|
c53be46d78
|
added optimized dasum kernel for POWER8
|
2016-03-28 12:17:15 +02:00 |
Werner Saar
|
659ed16591
|
added otimized cswap and zswap kernels for POWER8
|
2016-03-27 18:31:37 +02:00 |
Werner Saar
|
35c98a3556
|
added optimized zscal kernel for POWER8
|
2016-03-27 16:31:50 +02:00 |
Werner Saar
|
f1a5dd06c5
|
added optimized sscal kernel for POWER8
|
2016-03-27 11:05:56 +02:00 |
wernsaar
|
e125a3dc33
|
Merge pull request #824 from wernsaar/develop
added optimized drot-kernel and srot-kernel for POWER8
|
2016-03-27 10:43:17 +02:00 |
Werner Saar
|
35f1f21a7f
|
added drot- and srot-kernel optimimized for POWER8
|
2016-03-27 08:57:11 +02:00 |
Zhang Xianyi
|
7b4b7179ba
|
Merge pull request #819 from ashwinyes/develop_20160324_fixes_optimizations
Cortex-A57: Fixes and Optimizations
|
2016-03-27 00:04:20 -04:00 |
Werner Saar
|
3d9a50e841
|
added optimized sswap kernel for POWER8
|
2016-03-25 17:34:55 +01:00 |
Werner Saar
|
828c849b44
|
added optimized ccopy kernel for POWER8
|
2016-03-25 16:54:25 +01:00 |
Werner Saar
|
ecc0bc9813
|
added optimized scopy kernel for POWER8
|
2016-03-25 16:06:56 +01:00 |
Werner Saar
|
12f209b7b0
|
added optimized zswap kernel for POWER8
|
2016-03-25 15:27:34 +01:00 |
Werner Saar
|
7316a87930
|
added optimized dswap kernel for POWER8
|
2016-03-25 14:35:43 +01:00 |
Werner Saar
|
0bff057a87
|
added optimized dcopy kernel for POWER8
|
2016-03-25 13:03:02 +01:00 |
Werner Saar
|
1e6cf9808c
|
added optimized dscal kernel for POWER8
|
2016-03-25 09:42:08 +01:00 |
Ashwin Sekhar T K
|
278511ad2d
|
Cortex-A57: Fix clang compilation errors
|
2016-03-24 10:42:04 +05:30 |
Ashwin Sekhar T K
|
3b5ffb49d3
|
Cortex-A57: Improve DGEMM 8x4 Implementation
|
2016-03-24 10:25:18 +05:30 |
Werner Saar
|
55eda3813b
|
added optimized zaxpy kernel for POWER8
|
2016-03-23 11:20:23 +01:00 |
Werner Saar
|
0664ba4c97
|
added optimized daxpy kernel for POWER8
|
2016-03-22 14:50:03 +01:00 |
Werner Saar
|
11c44dede1
|
added optimized sdot kernel for POWER8
|
2016-03-21 13:18:23 +01:00 |
Werner Saar
|
9e4584d069
|
added optimized zdot kernel for POWER8
|
2016-03-21 10:12:07 +01:00 |
Werner Saar
|
cd9fafc054
|
ddot for POWER8: updated licence information
|
2016-03-20 11:19:27 +01:00 |
Werner Saar
|
84b92e6373
|
added optimized ddot kernel for POWER8
|
2016-03-20 11:06:06 +01:00 |
wernsaar
|
c279a53ed8
|
Merge pull request #806 from wernsaar/develop
adding optimized single precision blas level3 kernels for POWER8
|
2016-03-18 12:46:16 +01:00 |
Werner Saar
|
e1df5a6e23
|
fixed sgemm- and strmm-kernel
|
2016-03-18 12:12:03 +01:00 |
Werner Saar
|
5c658f8746
|
add optimized cgemm- and ctrmm-kernel for POWER8
|
2016-03-18 08:17:25 +01:00 |
Ashwin Sekhar T K
|
5ac02f6dc7
|
Optimize Dgemm 4x4 for Cortex A57
|
2016-03-14 19:35:23 +05:30 |
Ashwin Sekhar T K
|
7aa1ad4923
|
Functional Assembly Kernels for CortexA57
Adding functional (non-optimized) kernels for Cortex-A57
with the following layouts.
SGEMM - 16x4, 8x8
CGEMM - 8x4
DGEMM - 8x4, 4x8
|
2016-03-14 19:33:21 +05:30 |
Werner Saar
|
dcd15b546c
|
BUGFIX: KERNEL.POWER8
|
2016-03-14 14:36:59 +01:00 |
Werner Saar
|
96284ab295
|
added sgemm- and strmm-kernel for POWER8
|
2016-03-14 13:52:44 +01:00 |
Werner Saar
|
faa5e2e5e3
|
FIX: forgot the add the files cgemv_n_4.c and cgemv_t_4.c
|
2016-03-10 11:10:38 +01:00 |
Werner Saar
|
fdf291be30
|
Added optimized cgemv_n and cgemv_t kernels for bulldozer, piledriver and steamroller
|
2016-03-10 09:42:07 +01:00 |
Werner Saar
|
c99cc41cbd
|
Added optimized zgemv_n kernel for bulldozer, piledriver and steamroller
|
2016-03-09 14:02:03 +01:00 |
Werner Saar
|
acdff55a6a
|
Bugfix for ztrmv
|
2016-03-07 09:39:34 +01:00 |
Zhang Xianyi
|
7d6b68eb4a
|
Refs #786. Revert to default assembly kernel.
|
2016-03-07 11:34:58 +08:00 |
Werner Saar
|
cd5241d0cf
|
modified KERNEL for power, to use the generic DSDOT-KERNEL
|
2016-03-06 09:07:24 +01:00 |
Zhang Xianyi
|
8c43d7fa5f
|
Merge remote-tracking branch 'origin/power8' into develop
Refs #774
|
2016-03-05 06:03:19 -05:00 |
Werner Saar
|
085f215257
|
Modified assembly label name, so that they are hidden.
Added license informations.
|
2016-03-05 10:27:27 +01:00 |
Zhang Xianyi
|
8f758eeff9
|
Refs #786. avoid old assembly c/zgemv kernels.
|
2016-03-05 08:32:03 +08:00 |
Werner Saar
|
0afc76fd65
|
enabled gemm_beta assembly kernels
|
2016-03-04 15:01:15 +01:00 |
Werner Saar
|
91e1c5080c
|
modified configuration, to use power6 sgemm kernel for power8
|
2016-03-04 13:38:57 +01:00 |
Werner Saar
|
73f04c2c72
|
enabled hemv assemly function for power8
|
2016-03-04 13:20:50 +01:00 |
Werner Saar
|
3e633152c6
|
enabled symv assembly kernels on power8
|
2016-03-04 13:08:18 +01:00 |
Werner Saar
|
d5130ce7e3
|
enabled gemv assembly on power8
|
2016-03-04 12:53:31 +01:00 |
Werner Saar
|
4824b88fcb
|
enabled all level1 assembly kernels for power8
|
2016-03-04 12:35:25 +01:00 |
Werner Saar
|
b752858d6c
|
added dgemm-, dtrmm-, zgemm- and ztrmm-kernel for power8
|
2016-03-01 07:33:56 +01:00 |
Zhang Xianyi
|
efa4f5c936
|
Refs #695 #783. Replace default x86_64 cgemv_t
asm kernel by C kernel.
|
2016-03-01 11:18:56 +08:00 |
Zhang Xianyi
|
74b0672223
|
Fix c/zaxpyc kernel bug on Cortex-A57.
|
2016-02-23 22:47:53 +00:00 |
Zhang Xianyi
|
6e7be06e07
|
Refs JuliaLang/julia#5728. Fix gemv performance bug on Haswell Mac OSX.
On Mac OS X, it should use .align 4 (equal to .align 16 on Linux).
I didn't get the performance benefit from .align. Thus, I deleted it.
|
2016-02-19 17:56:07 -05:00 |
Zhang Xianyi
|
d06b92906a
|
Add gemm3m building for CMake.
|
2016-02-12 05:02:51 +08:00 |
Zhang Xianyi
|
962376664d
|
Refs #768. Swap the result of zdot x87 fp kernel.
|
2016-02-02 09:15:02 +08:00 |
Zhang Xianyi
|
c44ff4d648
|
Refs #714. avoid compiling warnings.
|
2016-01-28 04:38:07 +08:00 |
Werner Saar
|
63a7d7fb24
|
updated gemv_n_vfpv3.S for armv7
|
2016-01-25 15:00:13 +01:00 |
Werner Saar
|
b4ede558a5
|
updated nrm2 kernel for armv7
|
2016-01-25 11:55:25 +01:00 |
Werner Saar
|
de3e2d4349
|
updated trmm kernels for armv7
|
2016-01-25 11:08:56 +01:00 |
Werner Saar
|
a0e51e96f1
|
updated gemm kernels for armv7
|
2016-01-25 10:46:10 +01:00 |
Werner Saar
|
c2891330bc
|
updated KERNEL.ARMV6
|
2016-01-24 17:12:07 +01:00 |
Werner Saar
|
ceaa931e48
|
updated gemv kernel for armv6
|
2016-01-24 16:31:19 +01:00 |
Werner Saar
|
eaa63165df
|
updated cgemv and zgemv kernels for armv6
|
2016-01-24 14:42:38 +01:00 |
Werner Saar
|
c65357c566
|
updated trmm_kernels for armv6
|
2016-01-24 13:03:33 +01:00 |
Werner Saar
|
e63e9f9f26
|
updated gemm_kernels for armv6
|
2016-01-24 11:55:50 +01:00 |
Werner Saar
|
aafd3ab60e
|
updated cdot and zdot on arm
|
2016-01-24 10:56:49 +01:00 |
Werner Saar
|
d2f84c9c8a
|
Ref #740: updated nrm2_vfp.S
|
2016-01-23 17:47:58 +01:00 |
Werner Saar
|
ca32253f32
|
Ref #740: updated asum_vfp.S and iamax_vfp.S
|
2016-01-23 14:44:34 +01:00 |
Werner Saar
|
9066d1f982
|
Ref #750 and Ref #740 : bugfix for sdot, dsdot and ddot on arm
|
2016-01-23 11:59:51 +01:00 |
Werner Saar
|
692d9c881c
|
Ref #740: simple solution to clear floating point register on arm
|
2016-01-17 15:37:12 +01:00 |
Zhang Xianyi
|
3602a2cd1f
|
#736 Revert #733 patch to fix bus error on ARM.
|
2016-01-12 22:19:58 +00:00 |
Zhang Xianyi
|
e3e20e2242
|
Merge pull request #733 from yuyichao/arm-asm
Do not use vsub to clear the register values
|
2016-01-05 19:35:12 -06:00 |
Yichao Yu
|
594b9f4c73
|
Do not use vsub to clear the register values since it doesn't work with non-normal numbers.
|
2016-01-05 16:54:05 +00:00 |
Werner Saar
|
c8f2c5d636
|
added optimized trsm_kernels
|
2016-01-05 13:05:05 +01:00 |
Ashwin Sekhar T K
|
318f0949c3
|
lapack-test fixes in nrm2 kernels for Cortex A57
|
2015-11-23 13:43:36 +05:30 |
Ashwin Sekhar T K
|
98965da2e8
|
lapack-test fixes for Cortex A57
|
2015-11-20 01:15:04 +05:30 |
Ashwin Sekhar T K
|
c99c43d51e
|
Optimized trmm kernels for CORTEXA57
|
2015-11-09 14:15:54 +05:30 |
Ashwin Sekhar T K
|
1397b47197
|
Optimized zgemm kernel for CORTEXA57
|
2015-11-09 14:15:53 +05:30 |
Ashwin Sekhar T K
|
45f78963ac
|
Optimized cgemm kernel for CORTEXA57
Also, add a generic ztrmm 4x4 kernel
|
2015-11-09 14:15:53 +05:30 |
Ashwin Sekhar T K
|
402443bf9c
|
Optimized dgemm kernel for CORTEXA57
|
2015-11-09 14:15:53 +05:30 |
Ashwin Sekhar T K
|
19fdbee291
|
Improve the sgemm kernel for CORTEXA57
|
2015-11-09 14:15:53 +05:30 |
Ashwin Sekhar T K
|
3b0cdfab1e
|
Optimized gemv kernels for CORTEXA57
Co-Authored-By: Ralph Campbell <ralph.campbell@broadcom.com>
|
2015-11-09 14:15:52 +05:30 |
Ashwin Sekhar T K
|
46efa6a1da
|
Optimized swap kernels for CORTEXA57
Co-Authored-By: Ralph Campbell <ralph.campbell@broadcom.com>
|
2015-11-09 14:15:52 +05:30 |