Commit Graph

152 Commits

Author SHA1 Message Date
Martin Kroeker a2d867f4d1
Allow negative iNCX (API change from version 3.10 of the reference implementation) 2023-08-10 16:49:05 +02:00
Martin Kroeker 0a4546b742
Typo fix 2021-02-23 13:14:35 +01:00
Martin Kroeker b1eed27a54
Replace naive omatcopy_rt with 4x4 blocked implementation
as suggested by MigMuc in issue 2532
2021-02-22 21:35:42 +01:00
Martin Kroeker 43aac5bacc
Support NVIDIA HPC compiler 2021-01-12 16:36:12 +01:00
Martin Kroeker f8346603cf
Fix compilation with SolarisStudio 2020-12-06 19:14:16 +01:00
Qiyu8 c4c591ac5a fix sum optimize issues 2020-11-10 16:16:38 +08:00
Martin Kroeker 28d2dfe2b3
Fix macro name used in ifdef 2020-11-07 12:17:49 +01:00
Qiyu8 bfdf4b56da Add double precision universal intrinsics for X86/ARM 2020-10-15 10:29:42 +08:00
Qiyu8 0ed1f07660 Optimize the performance of sum by using universal intrinsics 2020-10-12 19:48:53 +08:00
Martin Kroeker bf1f0734ff
Use OPENBLAS_MAKE_COMPLEX_FLOAT on PPC only 2020-07-23 20:40:13 +00:00
Martin Kroeker 7c6e56b5df
Rewrite assignment to complex for better portability 2020-07-23 17:10:59 +02:00
Martin Kroeker 806f89166e
Make ARMV7 compile with xcode and add a CI job for it (#2537)
* Add an ARMV7 iOS build on Travis

* thread_local appears to be unavailable on ARMV7 iOS

* Add no-thumb option for ARMV7 IOS build to get it to accept DMB ISH

* Make local labels in macros of nrm2_vfpv3.S compatible with the xcode assembler
2020-04-02 10:30:37 +02:00
Martin Kroeker 74c10b57c6
Use generic kernels for complex (I)AMAX to support softfp 2019-05-30 11:38:11 +02:00
Martin Kroeker c5495d2056
Ensure correct output for DAMAX with softfp 2019-05-30 11:25:43 +02:00
Martin Kroeker c70496b108
Separate implementations of AMAX and IAMAX on arm
As noted in #1912 and comment on #1942, the combined implementation happens to "do the right thing" on hardfp, but cannot return both value and index on softfp where they would have to share the return register
2019-05-29 15:02:51 +02:00
Martin Kroeker 94ab4e6fb2
Add ARM implementations of ?sum
(trivial copies of the respective ?asum with the fabs calls removed)
2019-03-30 22:11:38 +01:00
Martin Kroeker 808410c2c7
Fix wrong comparison that made IMIN identical to IMAX
as suggested in #1990
2019-01-31 15:25:15 +01:00
Martin Kroeker 9b2a7ad40d
Convert fldmia/fstmia instructions to UAL syntax for clang7
second part of fix for #1774, containing files missed in #1775
2018-09-28 23:05:15 +02:00
Martin Kroeker 7e5df34e6a
Convert fldmia/fstmia instructions to UAL syntax for clang7
fixes #1774
2018-09-25 09:41:58 +02:00
Martin Kroeker b83e4c60c7
Remove premature exit for INC_X or INC_Y zero 2018-06-26 20:46:42 +02:00
Martin Kroeker e344db269b
Remove premature exit for INC_X or INC_Y zero 2018-06-26 20:45:57 +02:00
Martin Kroeker 545b82efd3
Remove premature exit for INC_X or INC_Y zero 2018-06-26 20:45:00 +02:00
Martin Kroeker e322a951fe
Remove premature exit for INC_X or INC_Y zero 2018-06-26 20:44:13 +02:00
Martin Kroeker 2d0929fa7c
Move the test for zero incx,incy in ARMV7 ROT
to pass the related utest (see #1469)
2018-04-24 22:43:00 +02:00
Martin Kroeker 125343cc88
Drop test for zero incx,incy in armv7 AXPY
...to pass the related utest (see #1469)
2018-04-24 22:39:50 +02:00
Martin Kroeker 6e70287776
Use generic/dot.c for DSDOT on ARMV5 and above
The default arm/dot.c is less precise when used for DSDOT, as shown by utest
2018-02-25 19:57:23 +01:00
Zhang Xianyi d5ef0dee9a Merge pull request #1226 from ashwinyes/develop_arm_clang_ual_fix
arm: Fix clang compilation for ARMv7
2017-07-10 20:04:42 +08:00
Ashwin Sekhar T K f02d535fde arm: Fix clang compilation for ARMv7
clang is not recognizing some pre-UAL VFP mnemonics like fnmacs, fnmacd,
fnmuls and fnmuld. Replaced them with equivalent UAL mnemonics which are
vmls.f32, vmls.f64, vnmul.f32 and vnmul.f64 respectively.
2017-07-07 12:35:58 +05:30
Ashwin Sekhar T K 97d671eb61 arm: add softfp support in zgemm/ztrmm vfp kernels 2017-07-02 02:54:32 +05:30
Ashwin Sekhar T K 305cd2e8b4 arm: add softfp support in cgemm/ctrmm vfp kernels 2017-07-02 02:42:32 +05:30
Ashwin Sekhar T K 09bc6ebe5b arm: add softfp support in dgemm/dtrmm vfp kernels 2017-07-02 02:24:38 +05:30
Ashwin Sekhar T K 872a11a2bf arm: add softfp support in sgemm/strmm vfp kernels 2017-07-02 02:23:48 +05:30
Ashwin Sekhar T K 8f83d3f961 arm: add softfp support in vfp gemv kernels 2017-07-02 01:03:31 +05:30
Ashwin Sekhar T K 83bd547517 arm: add softfp support in kernel/arm/swap_vfp.S 2017-07-01 20:37:40 +05:30
Ashwin Sekhar T K e25f4c01d6 arm: add softfp support in kernel/arm/nrm2_vfp*.S 2017-07-01 19:57:28 +05:30
Ashwin Sekhar T K 54915ce343 arm: add softfp support in kernel/arm/*dot_vfp.S 2017-06-30 23:46:02 +05:30
Ashwin Sekhar T K 0150fabdb6 arm: add softfp support in kernel/arm/rot_vfp.S 2017-06-30 21:52:32 +05:30
Ashwin Sekhar T K 4f0773f07d arm: add softfp support in kernel/arm/axpy_vfp.S 2017-06-30 20:25:59 +05:30
Ashwin Sekhar T K aa5edebc80 arm: add softfp support in kernel/arm/asum_vfp.S 2017-06-30 18:21:05 +05:30
Ashwin Sekhar T K 89924b3d5b arm: Use assembly implementations based on the ARM abi
In case of softfp abi, assembly implementations of only those APIs are
used which doesnt have a floating point argument or return value.

In case of hard abi, all assembly implementations are used.
2017-06-30 18:21:05 +05:30
Zhang Xianyi b5c96fcfcd Support ARM SOFTFP ABI for saxpy, sdot, snrm2, sscal, sgemv, sger. 2017-03-20 17:39:25 +08:00
Zhang Xianyi 90e02ccf68 Support ARM softfp ABI for sgemm on ARMV7.
make ARM_SOFTFP_ABI=1
2017-03-06 22:16:13 +08:00
Martin Kroeker 6221d6df5f Update zdot.c 2016-10-05 18:57:14 +02:00
Martin Kroeker 4b1b27347f Remove explicit include of complex.h 2016-09-29 23:39:35 +02:00
Werner Saar 8037d78eed bugfix for arm scal.c and zscal.c 2016-04-11 11:21:36 +02:00
Werner Saar 63a7d7fb24 updated gemv_n_vfpv3.S for armv7 2016-01-25 15:00:13 +01:00
Werner Saar b4ede558a5 updated nrm2 kernel for armv7 2016-01-25 11:55:25 +01:00
Werner Saar de3e2d4349 updated trmm kernels for armv7 2016-01-25 11:08:56 +01:00
Werner Saar a0e51e96f1 updated gemm kernels for armv7 2016-01-25 10:46:10 +01:00
Werner Saar c2891330bc updated KERNEL.ARMV6 2016-01-24 17:12:07 +01:00