Martin Kroeker
a2d867f4d1
Allow negative iNCX (API change from version 3.10 of the reference implementation)
2023-08-10 16:49:05 +02:00
Martin Kroeker
0a4546b742
Typo fix
2021-02-23 13:14:35 +01:00
Martin Kroeker
b1eed27a54
Replace naive omatcopy_rt with 4x4 blocked implementation
...
as suggested by MigMuc in issue 2532
2021-02-22 21:35:42 +01:00
Martin Kroeker
43aac5bacc
Support NVIDIA HPC compiler
2021-01-12 16:36:12 +01:00
Martin Kroeker
f8346603cf
Fix compilation with SolarisStudio
2020-12-06 19:14:16 +01:00
Qiyu8
c4c591ac5a
fix sum optimize issues
2020-11-10 16:16:38 +08:00
Martin Kroeker
28d2dfe2b3
Fix macro name used in ifdef
2020-11-07 12:17:49 +01:00
Qiyu8
bfdf4b56da
Add double precision universal intrinsics for X86/ARM
2020-10-15 10:29:42 +08:00
Qiyu8
0ed1f07660
Optimize the performance of sum by using universal intrinsics
2020-10-12 19:48:53 +08:00
Martin Kroeker
bf1f0734ff
Use OPENBLAS_MAKE_COMPLEX_FLOAT on PPC only
2020-07-23 20:40:13 +00:00
Martin Kroeker
7c6e56b5df
Rewrite assignment to complex for better portability
2020-07-23 17:10:59 +02:00
Martin Kroeker
806f89166e
Make ARMV7 compile with xcode and add a CI job for it ( #2537 )
...
* Add an ARMV7 iOS build on Travis
* thread_local appears to be unavailable on ARMV7 iOS
* Add no-thumb option for ARMV7 IOS build to get it to accept DMB ISH
* Make local labels in macros of nrm2_vfpv3.S compatible with the xcode assembler
2020-04-02 10:30:37 +02:00
Martin Kroeker
74c10b57c6
Use generic kernels for complex (I)AMAX to support softfp
2019-05-30 11:38:11 +02:00
Martin Kroeker
c5495d2056
Ensure correct output for DAMAX with softfp
2019-05-30 11:25:43 +02:00
Martin Kroeker
c70496b108
Separate implementations of AMAX and IAMAX on arm
...
As noted in #1912 and comment on #1942 , the combined implementation happens to "do the right thing" on hardfp, but cannot return both value and index on softfp where they would have to share the return register
2019-05-29 15:02:51 +02:00
Martin Kroeker
94ab4e6fb2
Add ARM implementations of ?sum
...
(trivial copies of the respective ?asum with the fabs calls removed)
2019-03-30 22:11:38 +01:00
Martin Kroeker
808410c2c7
Fix wrong comparison that made IMIN identical to IMAX
...
as suggested in #1990
2019-01-31 15:25:15 +01:00
Martin Kroeker
9b2a7ad40d
Convert fldmia/fstmia instructions to UAL syntax for clang7
...
second part of fix for #1774 , containing files missed in #1775
2018-09-28 23:05:15 +02:00
Martin Kroeker
7e5df34e6a
Convert fldmia/fstmia instructions to UAL syntax for clang7
...
fixes #1774
2018-09-25 09:41:58 +02:00
Martin Kroeker
b83e4c60c7
Remove premature exit for INC_X or INC_Y zero
2018-06-26 20:46:42 +02:00
Martin Kroeker
e344db269b
Remove premature exit for INC_X or INC_Y zero
2018-06-26 20:45:57 +02:00
Martin Kroeker
545b82efd3
Remove premature exit for INC_X or INC_Y zero
2018-06-26 20:45:00 +02:00
Martin Kroeker
e322a951fe
Remove premature exit for INC_X or INC_Y zero
2018-06-26 20:44:13 +02:00
Martin Kroeker
2d0929fa7c
Move the test for zero incx,incy in ARMV7 ROT
...
to pass the related utest (see #1469 )
2018-04-24 22:43:00 +02:00
Martin Kroeker
125343cc88
Drop test for zero incx,incy in armv7 AXPY
...
...to pass the related utest (see #1469 )
2018-04-24 22:39:50 +02:00
Martin Kroeker
6e70287776
Use generic/dot.c for DSDOT on ARMV5 and above
...
The default arm/dot.c is less precise when used for DSDOT, as shown by utest
2018-02-25 19:57:23 +01:00
Zhang Xianyi
d5ef0dee9a
Merge pull request #1226 from ashwinyes/develop_arm_clang_ual_fix
...
arm: Fix clang compilation for ARMv7
2017-07-10 20:04:42 +08:00
Ashwin Sekhar T K
f02d535fde
arm: Fix clang compilation for ARMv7
...
clang is not recognizing some pre-UAL VFP mnemonics like fnmacs, fnmacd,
fnmuls and fnmuld. Replaced them with equivalent UAL mnemonics which are
vmls.f32, vmls.f64, vnmul.f32 and vnmul.f64 respectively.
2017-07-07 12:35:58 +05:30
Ashwin Sekhar T K
97d671eb61
arm: add softfp support in zgemm/ztrmm vfp kernels
2017-07-02 02:54:32 +05:30
Ashwin Sekhar T K
305cd2e8b4
arm: add softfp support in cgemm/ctrmm vfp kernels
2017-07-02 02:42:32 +05:30
Ashwin Sekhar T K
09bc6ebe5b
arm: add softfp support in dgemm/dtrmm vfp kernels
2017-07-02 02:24:38 +05:30
Ashwin Sekhar T K
872a11a2bf
arm: add softfp support in sgemm/strmm vfp kernels
2017-07-02 02:23:48 +05:30
Ashwin Sekhar T K
8f83d3f961
arm: add softfp support in vfp gemv kernels
2017-07-02 01:03:31 +05:30
Ashwin Sekhar T K
83bd547517
arm: add softfp support in kernel/arm/swap_vfp.S
2017-07-01 20:37:40 +05:30
Ashwin Sekhar T K
e25f4c01d6
arm: add softfp support in kernel/arm/nrm2_vfp*.S
2017-07-01 19:57:28 +05:30
Ashwin Sekhar T K
54915ce343
arm: add softfp support in kernel/arm/*dot_vfp.S
2017-06-30 23:46:02 +05:30
Ashwin Sekhar T K
0150fabdb6
arm: add softfp support in kernel/arm/rot_vfp.S
2017-06-30 21:52:32 +05:30
Ashwin Sekhar T K
4f0773f07d
arm: add softfp support in kernel/arm/axpy_vfp.S
2017-06-30 20:25:59 +05:30
Ashwin Sekhar T K
aa5edebc80
arm: add softfp support in kernel/arm/asum_vfp.S
2017-06-30 18:21:05 +05:30
Ashwin Sekhar T K
89924b3d5b
arm: Use assembly implementations based on the ARM abi
...
In case of softfp abi, assembly implementations of only those APIs are
used which doesnt have a floating point argument or return value.
In case of hard abi, all assembly implementations are used.
2017-06-30 18:21:05 +05:30
Zhang Xianyi
b5c96fcfcd
Support ARM SOFTFP ABI for saxpy, sdot, snrm2, sscal, sgemv, sger.
2017-03-20 17:39:25 +08:00
Zhang Xianyi
90e02ccf68
Support ARM softfp ABI for sgemm on ARMV7.
...
make ARM_SOFTFP_ABI=1
2017-03-06 22:16:13 +08:00
Martin Kroeker
6221d6df5f
Update zdot.c
2016-10-05 18:57:14 +02:00
Martin Kroeker
4b1b27347f
Remove explicit include of complex.h
2016-09-29 23:39:35 +02:00
Werner Saar
8037d78eed
bugfix for arm scal.c and zscal.c
2016-04-11 11:21:36 +02:00
Werner Saar
63a7d7fb24
updated gemv_n_vfpv3.S for armv7
2016-01-25 15:00:13 +01:00
Werner Saar
b4ede558a5
updated nrm2 kernel for armv7
2016-01-25 11:55:25 +01:00
Werner Saar
de3e2d4349
updated trmm kernels for armv7
2016-01-25 11:08:56 +01:00
Werner Saar
a0e51e96f1
updated gemm kernels for armv7
2016-01-25 10:46:10 +01:00
Werner Saar
c2891330bc
updated KERNEL.ARMV6
2016-01-24 17:12:07 +01:00