Commit Graph

862 Commits

Author SHA1 Message Date
Isuru Fernando d3b677fe87 Add commonobjs 2017-08-07 23:12:40 +05:30
Isuru Fernando 505b218829 Merge remote-tracking branch 'upstream/develop' into dyn 2017-08-06 19:07:00 +05:30
Isuru Fernando d9346930dd Merge remote-tracking branch 'upstream/develop' into develop 2017-08-04 07:57:55 +05:30
Ashwin Sekhar T K 4899d67f7d THUDNERX2T99: Fix clang compilation 2017-08-02 11:28:45 -07:00
Isuru Fernando 1d1854032b Add missing EXCAVATOR 2017-08-02 19:03:04 +05:30
Isuru Fernando 2c51a990ac Fix extra whitespaces. CMake parser macro fails with it
TODO: Fix the parser macro to strip trailing whitespaces
2017-08-02 18:26:57 +05:30
Isuru Fernando 7892434572 Add hemm3m and symm3m objects 2017-08-02 18:24:54 +05:30
Isuru Fernando d798487213 Fixes for dynamic_arch. almost there 2017-08-02 17:25:49 +05:30
Isuru Fernando 251715d9ef configure kernel_core.h 2017-08-01 23:23:55 +05:30
Isuru Fernando 50deeb49b7 configure setparam 2017-08-01 22:32:47 +05:30
Isuru Fernando 4260215adf Support DYNAMIC_ARCH with cmake 2017-08-01 22:25:52 +05:30
Isuru Fernando d245caa49a Support out-of-source build 2017-08-01 15:16:14 +05:30
Isuru Fernando ca17b4b75c Fix complex support for MSVC headers 2017-07-28 11:50:29 +05:30
Isuru Fernando dc24914415 check compiler is msvc instead of msvc 2017-07-28 11:49:39 +05:30
Zhang Xianyi d5ef0dee9a Merge pull request #1226 from ashwinyes/develop_arm_clang_ual_fix
arm: Fix clang compilation for ARMv7
2017-07-10 20:04:42 +08:00
Zhang Xianyi 4239dd65ce Merge branch 'develop' into develop_arm_softfp 2017-07-10 20:02:36 +08:00
Ashwin Sekhar T K f02d535fde arm: Fix clang compilation for ARMv7
clang is not recognizing some pre-UAL VFP mnemonics like fnmacs, fnmacd,
fnmuls and fnmuld. Replaced them with equivalent UAL mnemonics which are
vmls.f32, vmls.f64, vnmul.f32 and vnmul.f64 respectively.
2017-07-07 12:35:58 +05:30
Zhang Xianyi a6515bb858 Merge pull request #1218 from m-brow/power9
Optimise loads on Power9 LE
2017-07-03 13:48:29 +08:00
Ashwin Sekhar T K 37efb5bc1d arm: Remove unnecessary files/code
Since softfp code has been added to all required vfp kernels,
the code for auto detection of abi is no longer required.

The option to force softfp ABI on make command line by giving
ARM_SOFTFP_ABI=1 is retained. But there is no need to give this option
anymore.

Also the newly added C versions of 4x4/4x2 gemm/trmm kernels are removed.
These are longer required. Moreover these kernels has bugs.
2017-07-02 03:06:36 +05:30
Ashwin Sekhar T K 97d671eb61 arm: add softfp support in zgemm/ztrmm vfp kernels 2017-07-02 02:54:32 +05:30
Ashwin Sekhar T K 305cd2e8b4 arm: add softfp support in cgemm/ctrmm vfp kernels 2017-07-02 02:42:32 +05:30
Ashwin Sekhar T K 09bc6ebe5b arm: add softfp support in dgemm/dtrmm vfp kernels 2017-07-02 02:24:38 +05:30
Ashwin Sekhar T K 872a11a2bf arm: add softfp support in sgemm/strmm vfp kernels 2017-07-02 02:23:48 +05:30
Ashwin Sekhar T K eda9e8632a generic: Bug fixes in generic 4x2 and 4x4 gemm kernels 2017-07-02 02:00:48 +05:30
Ashwin Sekhar T K 8f83d3f961 arm: add softfp support in vfp gemv kernels 2017-07-02 01:03:31 +05:30
Ashwin Sekhar T K 83bd547517 arm: add softfp support in kernel/arm/swap_vfp.S 2017-07-01 20:37:40 +05:30
Ashwin Sekhar T K e25f4c01d6 arm: add softfp support in kernel/arm/nrm2_vfp*.S 2017-07-01 19:57:28 +05:30
Ashwin Sekhar T K 54915ce343 arm: add softfp support in kernel/arm/*dot_vfp.S 2017-06-30 23:46:02 +05:30
Ashwin Sekhar T K 0150fabdb6 arm: add softfp support in kernel/arm/rot_vfp.S 2017-06-30 21:52:32 +05:30
Ashwin Sekhar T K 4f0773f07d arm: add softfp support in kernel/arm/axpy_vfp.S 2017-06-30 20:25:59 +05:30
Ashwin Sekhar T K aa5edebc80 arm: add softfp support in kernel/arm/asum_vfp.S 2017-06-30 18:21:05 +05:30
Ashwin Sekhar T K 89924b3d5b arm: Use assembly implementations based on the ARM abi
In case of softfp abi, assembly implementations of only those APIs are
used which doesnt have a floating point argument or return value.

In case of hard abi, all assembly implementations are used.
2017-06-30 18:21:05 +05:30
Ashwin Sekhar T K da7f0ff425 generic: add some generic gemm and trmm kernels
Added generic 4x4 and 4x2 gemm kernels
Added generic 4x2 trmm kernel
2017-06-30 18:21:05 +05:30
Zhang Xianyi 482015f8d6 Merge branch 'arm_soft_fp_abi' into develop 2017-06-23 11:35:25 +08:00
Matt Brown bd831a03a8 Optimise sscal for POWER9
Use lxvd2x instruction instead of lxvw4x.
lxvd2x performs far better on the new POWER architecture than lxvw4x.
2017-06-14 17:02:46 +10:00
Matt Brown edc97918f8 Optimise srot for POWER9
Use lxvd2x instruction instead of lxvw4x.
lxvd2x performs far better on the new POWER architecture than lxvw4x.
2017-06-14 17:02:35 +10:00
Matt Brown e0034de22d Optimise sdot for POWER9
Use lxvd2x instruction instead of lxvw4x.
lxvd2x performs far better on the new POWER architecture than lxvw4x.
2017-06-14 17:02:19 +10:00
Matt Brown 32c7fe6bff Optimise sasum for POWER9
Use lxvd2x instruction instead of lxvw4x.
lxvd2x performs far better on the new POWER architecture than lxvw4x.
2017-06-14 17:02:10 +10:00
Matt Brown 19bdf9d52b Optimise casum for POWER9
Use lxvd2x instruction instead of lxvw4x.
lxvd2x performs far better on the new POWER architecture than lxvw4x.
2017-06-14 17:00:07 +10:00
Matt Brown 4f09030fdc Optimise cswap for POWER9
Use lxvd2x instruction instead of lxvw4x.
lxvd2x performs far better on the new POWER architecture than lxvw4x.
2017-06-14 16:59:53 +10:00
Matt Brown 6f4eca5ea4 Optimise sswap for POWER9
Use lxvd2x instruction instead of lxvw4x.
lxvd2x performs far better on the new POWER architecture than lxvw4x.
2017-06-14 16:59:13 +10:00
Matt Brown be55f96cbd Optimise scopy for POWER9
Use lxvd2x instruction instead of lxvw4x.
lxvd2x performs far better on the new POWER architecture than lxvw4x.
2017-06-14 16:59:13 +10:00
Matt Brown 96dd0ef4f7 Optimise ccopy for POWER9
Use lxvd2x instruction instead of lxvw4x.
lxvd2x performs far better on the new POWER architecture than lxvw4x.
2017-06-14 16:58:59 +10:00
Alan Modra dc40bc7368 Power8 inline assembly tweaks
Further fixes on top of 9e2f316ed.  Writing some doco for gcc on
inline assembly woke me up to some more errors.

- dgemv_kernel_4x4 asm did not mention *ap as a memory input, and
  *y is both read and write.
- sasum_kernel_32 and casum_kernel_16 did not use %x for a vsx insn
  operand, a problem if the "=f" sum output was ever allocated a vsx
  reg in the altivec set.  This might be possible with inlining and
  future gcc optimisation.
2017-04-04 23:13:54 +09:30
Zhang Xianyi b5c96fcfcd Support ARM SOFTFP ABI for saxpy, sdot, snrm2, sscal, sgemv, sger. 2017-03-20 17:39:25 +08:00
Denis Steckelmacher c9ff735da6 Add ZEN support (tested for auto-detected static backend) 2017-03-19 15:32:50 +01:00
Martin Kroeker cd135e2b59 Merge pull request #1130 from quickwritereader/develop
Blas 3 for single precision
2017-03-15 10:00:52 +01:00
Martin Kroeker a6efabf155 Replace gnu _real_ , _imag_ extensions in initializers 2017-03-13 00:38:37 +01:00
Abdurrauf 08786c4b95 strmm and ctrmm 2017-03-13 01:23:16 +04:00
Zhang Xianyi 90e02ccf68 Support ARM softfp ABI for sgemm on ARMV7.
make ARM_SOFTFP_ABI=1
2017-03-06 22:16:13 +08:00