Zhang Xianyi
d5ef0dee9a
Merge pull request #1226 from ashwinyes/develop_arm_clang_ual_fix
...
arm: Fix clang compilation for ARMv7
2017-07-10 20:04:42 +08:00
Zhang Xianyi
4239dd65ce
Merge branch 'develop' into develop_arm_softfp
2017-07-10 20:02:36 +08:00
Ashwin Sekhar T K
f02d535fde
arm: Fix clang compilation for ARMv7
...
clang is not recognizing some pre-UAL VFP mnemonics like fnmacs, fnmacd,
fnmuls and fnmuld. Replaced them with equivalent UAL mnemonics which are
vmls.f32, vmls.f64, vnmul.f32 and vnmul.f64 respectively.
2017-07-07 12:35:58 +05:30
Zhang Xianyi
a6515bb858
Merge pull request #1218 from m-brow/power9
...
Optimise loads on Power9 LE
2017-07-03 13:48:29 +08:00
Ashwin Sekhar T K
37efb5bc1d
arm: Remove unnecessary files/code
...
Since softfp code has been added to all required vfp kernels,
the code for auto detection of abi is no longer required.
The option to force softfp ABI on make command line by giving
ARM_SOFTFP_ABI=1 is retained. But there is no need to give this option
anymore.
Also the newly added C versions of 4x4/4x2 gemm/trmm kernels are removed.
These are longer required. Moreover these kernels has bugs.
2017-07-02 03:06:36 +05:30
Ashwin Sekhar T K
97d671eb61
arm: add softfp support in zgemm/ztrmm vfp kernels
2017-07-02 02:54:32 +05:30
Ashwin Sekhar T K
305cd2e8b4
arm: add softfp support in cgemm/ctrmm vfp kernels
2017-07-02 02:42:32 +05:30
Ashwin Sekhar T K
09bc6ebe5b
arm: add softfp support in dgemm/dtrmm vfp kernels
2017-07-02 02:24:38 +05:30
Ashwin Sekhar T K
872a11a2bf
arm: add softfp support in sgemm/strmm vfp kernels
2017-07-02 02:23:48 +05:30
Ashwin Sekhar T K
eda9e8632a
generic: Bug fixes in generic 4x2 and 4x4 gemm kernels
2017-07-02 02:00:48 +05:30
Ashwin Sekhar T K
8f83d3f961
arm: add softfp support in vfp gemv kernels
2017-07-02 01:03:31 +05:30
Ashwin Sekhar T K
83bd547517
arm: add softfp support in kernel/arm/swap_vfp.S
2017-07-01 20:37:40 +05:30
Ashwin Sekhar T K
e25f4c01d6
arm: add softfp support in kernel/arm/nrm2_vfp*.S
2017-07-01 19:57:28 +05:30
Ashwin Sekhar T K
54915ce343
arm: add softfp support in kernel/arm/*dot_vfp.S
2017-06-30 23:46:02 +05:30
Ashwin Sekhar T K
0150fabdb6
arm: add softfp support in kernel/arm/rot_vfp.S
2017-06-30 21:52:32 +05:30
Ashwin Sekhar T K
4f0773f07d
arm: add softfp support in kernel/arm/axpy_vfp.S
2017-06-30 20:25:59 +05:30
Ashwin Sekhar T K
aa5edebc80
arm: add softfp support in kernel/arm/asum_vfp.S
2017-06-30 18:21:05 +05:30
Ashwin Sekhar T K
89924b3d5b
arm: Use assembly implementations based on the ARM abi
...
In case of softfp abi, assembly implementations of only those APIs are
used which doesnt have a floating point argument or return value.
In case of hard abi, all assembly implementations are used.
2017-06-30 18:21:05 +05:30
Ashwin Sekhar T K
da7f0ff425
generic: add some generic gemm and trmm kernels
...
Added generic 4x4 and 4x2 gemm kernels
Added generic 4x2 trmm kernel
2017-06-30 18:21:05 +05:30
Zhang Xianyi
482015f8d6
Merge branch 'arm_soft_fp_abi' into develop
2017-06-23 11:35:25 +08:00
Matt Brown
bd831a03a8
Optimise sscal for POWER9
...
Use lxvd2x instruction instead of lxvw4x.
lxvd2x performs far better on the new POWER architecture than lxvw4x.
2017-06-14 17:02:46 +10:00
Matt Brown
edc97918f8
Optimise srot for POWER9
...
Use lxvd2x instruction instead of lxvw4x.
lxvd2x performs far better on the new POWER architecture than lxvw4x.
2017-06-14 17:02:35 +10:00
Matt Brown
e0034de22d
Optimise sdot for POWER9
...
Use lxvd2x instruction instead of lxvw4x.
lxvd2x performs far better on the new POWER architecture than lxvw4x.
2017-06-14 17:02:19 +10:00
Matt Brown
32c7fe6bff
Optimise sasum for POWER9
...
Use lxvd2x instruction instead of lxvw4x.
lxvd2x performs far better on the new POWER architecture than lxvw4x.
2017-06-14 17:02:10 +10:00
Matt Brown
19bdf9d52b
Optimise casum for POWER9
...
Use lxvd2x instruction instead of lxvw4x.
lxvd2x performs far better on the new POWER architecture than lxvw4x.
2017-06-14 17:00:07 +10:00
Matt Brown
4f09030fdc
Optimise cswap for POWER9
...
Use lxvd2x instruction instead of lxvw4x.
lxvd2x performs far better on the new POWER architecture than lxvw4x.
2017-06-14 16:59:53 +10:00
Matt Brown
6f4eca5ea4
Optimise sswap for POWER9
...
Use lxvd2x instruction instead of lxvw4x.
lxvd2x performs far better on the new POWER architecture than lxvw4x.
2017-06-14 16:59:13 +10:00
Matt Brown
be55f96cbd
Optimise scopy for POWER9
...
Use lxvd2x instruction instead of lxvw4x.
lxvd2x performs far better on the new POWER architecture than lxvw4x.
2017-06-14 16:59:13 +10:00
Matt Brown
96dd0ef4f7
Optimise ccopy for POWER9
...
Use lxvd2x instruction instead of lxvw4x.
lxvd2x performs far better on the new POWER architecture than lxvw4x.
2017-06-14 16:58:59 +10:00
Alan Modra
dc40bc7368
Power8 inline assembly tweaks
...
Further fixes on top of 9e2f316ed
. Writing some doco for gcc on
inline assembly woke me up to some more errors.
- dgemv_kernel_4x4 asm did not mention *ap as a memory input, and
*y is both read and write.
- sasum_kernel_32 and casum_kernel_16 did not use %x for a vsx insn
operand, a problem if the "=f" sum output was ever allocated a vsx
reg in the altivec set. This might be possible with inlining and
future gcc optimisation.
2017-04-04 23:13:54 +09:30
Zhang Xianyi
b5c96fcfcd
Support ARM SOFTFP ABI for saxpy, sdot, snrm2, sscal, sgemv, sger.
2017-03-20 17:39:25 +08:00
Denis Steckelmacher
c9ff735da6
Add ZEN support (tested for auto-detected static backend)
2017-03-19 15:32:50 +01:00
Martin Kroeker
cd135e2b59
Merge pull request #1130 from quickwritereader/develop
...
Blas 3 for single precision
2017-03-15 10:00:52 +01:00
Martin Kroeker
a6efabf155
Replace gnu _real_ , _imag_ extensions in initializers
2017-03-13 00:38:37 +01:00
Abdurrauf
08786c4b95
strmm and ctrmm
2017-03-13 01:23:16 +04:00
Zhang Xianyi
90e02ccf68
Support ARM softfp ABI for sgemm on ARMV7.
...
make ARM_SOFTFP_ABI=1
2017-03-06 22:16:13 +08:00
Abdurrauf
82e80fa82b
initial strmm(sgemm). not tuned yet
2017-03-06 04:27:40 +04:00
Martin Kroeker
d1fe040d9b
Merge pull request #1110 from quickwritereader/develop
...
Conventional usage of the register save area.
2017-03-01 23:08:07 +01:00
Abdurrauf
411982715c
conventional usage of the register save area
2017-03-01 20:39:39 +04:00
Abdurrauf
e831d6924e
changed to conventional register save area
2017-03-01 03:13:21 +04:00
Martin Kroeker
ffc1d6c468
Merge pull request #1108 from ashwinyes/develop_20170203_thunderx2t99
...
Optimized Implementations for ThunderX2T99
2017-02-28 16:02:19 +01:00
Ashwin Sekhar T K
67473d09dd
THUNDERX2T99: Bug Fixes in D/Z NRM2 and ZGEMM
2017-02-28 01:11:38 -08:00
Ashwin Sekhar T K
19ba133383
THUNDERX2T99: Add Optimized ZGEMM Implementation
2017-02-28 05:31:41 +00:00
Abdurrauf
0d96b0e2a7
Merge branch 'z13' into develop
2017-02-26 06:17:33 +04:00
Abdurrauf
848cb27b1e
ztrmm kernel.
2017-02-26 06:14:12 +04:00
Martin Kroeker
dc34a0da96
Merge pull request #915 from mdong/small_fix_for_icc
...
remove input from clobbered list
2017-02-23 20:00:22 +01:00
Ashwin Sekhar T K
a3935f0dfb
THUNDERX2T99: Add Optimized D/Z NRM2 Implementation
2017-02-23 10:02:15 -08:00
Ashwin Sekhar T K
738628e9a8
ARM64: Remove unused code
2017-02-21 21:42:32 -08:00
Ashwin Sekhar T K
ab3ffab96a
THUNDERX2T99: Add Optimized C/Z DOT Implementation
2017-02-21 03:40:59 -08:00
Ashwin Sekhar T K
f036be9ce2
THUNDERX2T99: Add Optimized SDOT Implementation
2017-02-21 03:24:32 -08:00