Ubuntu
498ac98581
Note for unused kernels
2019-02-04 15:41:56 +00:00
Ubuntu
cd9ea45463
NBMAX=4096 for gemvn, added sgemvn 8x8 for future
2019-02-04 06:57:11 +00:00
Ubuntu
4abc375a91
sgemv cgemv pairs
2019-02-01 13:45:00 +00:00
Ubuntu
43a4572038
crot fix
2019-01-17 14:45:31 +00:00
Abdelrauf
a034e65512
Merge branch 'develop' into develop
2019-01-16 19:25:13 +04:00
Ubuntu
8c3386be87
Added missing Blas1 single fp {saxpy, caxpy, cdot, crot(refactored version of srot),isamax ,isamin, icamax, icamin},
...
Fixed idamin,icamin choosing the first occurance index of equal minimals
2019-01-16 15:16:21 +00:00
Martin Kroeker
961d25e9c7
Use the new zrot.c on POWER8 for crot as well
...
fixes #1571 (the old zrot.S assembly does not handle incx=0 correctly)
2018-05-23 22:54:39 +02:00
Martin Kroeker
8a3b6fa108
Use generic zrot.c on ppc64/POWER6 to work around utest failure from … ( #1535 )
...
* Use generic C implementation of zrot on ppc64/POWER6 to work around utest failure from #1469
2018-04-23 19:05:49 +02:00
QWR QWR
28ca97015d
power8:Added initial zgemv_(t|n) ,i(d|z)amax,i(d|z)amin,dgemv_t(transposed),zrot
...
z13: improved zgemv_(t|n)_4,zscal,zaxpy
2018-03-27 14:54:41 +00:00
the mslm
2c0a008281
dgemm_ncopy_4_ save/restore
2018-02-18 01:30:17 +00:00
the mslm
c5425daa6b
power8 ?gemm_tcopy save/restore
2018-02-16 23:36:46 +00:00
martin
7a4b3cfbf8
Add trivially optimized DSDOT for POWER8
2017-11-28 18:38:07 +01:00
Martin Kroeker
9c017a2218
Save and restore VSX registers
2017-09-28 12:17:09 +02:00
Matt Brown
bd831a03a8
Optimise sscal for POWER9
...
Use lxvd2x instruction instead of lxvw4x.
lxvd2x performs far better on the new POWER architecture than lxvw4x.
2017-06-14 17:02:46 +10:00
Matt Brown
edc97918f8
Optimise srot for POWER9
...
Use lxvd2x instruction instead of lxvw4x.
lxvd2x performs far better on the new POWER architecture than lxvw4x.
2017-06-14 17:02:35 +10:00
Matt Brown
e0034de22d
Optimise sdot for POWER9
...
Use lxvd2x instruction instead of lxvw4x.
lxvd2x performs far better on the new POWER architecture than lxvw4x.
2017-06-14 17:02:19 +10:00
Matt Brown
32c7fe6bff
Optimise sasum for POWER9
...
Use lxvd2x instruction instead of lxvw4x.
lxvd2x performs far better on the new POWER architecture than lxvw4x.
2017-06-14 17:02:10 +10:00
Matt Brown
19bdf9d52b
Optimise casum for POWER9
...
Use lxvd2x instruction instead of lxvw4x.
lxvd2x performs far better on the new POWER architecture than lxvw4x.
2017-06-14 17:00:07 +10:00
Matt Brown
4f09030fdc
Optimise cswap for POWER9
...
Use lxvd2x instruction instead of lxvw4x.
lxvd2x performs far better on the new POWER architecture than lxvw4x.
2017-06-14 16:59:53 +10:00
Matt Brown
6f4eca5ea4
Optimise sswap for POWER9
...
Use lxvd2x instruction instead of lxvw4x.
lxvd2x performs far better on the new POWER architecture than lxvw4x.
2017-06-14 16:59:13 +10:00
Matt Brown
be55f96cbd
Optimise scopy for POWER9
...
Use lxvd2x instruction instead of lxvw4x.
lxvd2x performs far better on the new POWER architecture than lxvw4x.
2017-06-14 16:59:13 +10:00
Matt Brown
96dd0ef4f7
Optimise ccopy for POWER9
...
Use lxvd2x instruction instead of lxvw4x.
lxvd2x performs far better on the new POWER architecture than lxvw4x.
2017-06-14 16:58:59 +10:00
Alan Modra
dc40bc7368
Power8 inline assembly tweaks
...
Further fixes on top of 9e2f316ed
. Writing some doco for gcc on
inline assembly woke me up to some more errors.
- dgemv_kernel_4x4 asm did not mention *ap as a memory input, and
*y is both read and write.
- sasum_kernel_32 and casum_kernel_16 did not use %x for a vsx insn
operand, a problem if the "=f" sum output was ever allocated a vsx
reg in the altivec set. This might be possible with inlining and
future gcc optimisation.
2017-04-04 23:13:54 +09:30
Martin Kroeker
9e2f316ede
Power8 inline assembly fixes
...
Quoting patch author amodra from #1078
Lots of issues here.
- The vsx regs weren't listed as clobbered.
- Poor choice of vsx regs, which along with the lack of clobbers led to
trashing v0..v21 and fr14..fr23. Ideally you'd let gcc choose all
temp vsx regs, but asms currently have a limit of 30 i/o parms.
- Other regs were clobbered unnecessarily, seemingly in an attempt to
clobber inputs, with gcc-7 complaining about the clobber of r2.
(Changed inputs should be also listed as outputs or as an i/o.)
- "r" constraint used instead of "b" for gprs used in insns where the
r0 encoding means zero rather than r0.
- There were unused asm inputs too.
- All memory was clobbered rather than hooking up memory outputs with
proper memory constraints, and that and the lack of proper memory
input constraints meant the asms needed to be volatile and their
containing function noinline.
- Some parameters were being passed unnecessarily via memory.
- When a copy of a
2017-02-13 23:38:50 +01:00
Martin Kroeker
a6e9e0b94b
Remove explicit include of complex.h
2016-09-29 23:43:28 +02:00
Zhang Xianyi
515bc56ea9
Refs #946 . Use nrm2 reference implementation for Power8.
2016-08-18 18:59:43 -07:00
Zhang Xianyi
ae70b916f4
Refs #929 . Deal with zero and NaNs for scale.
2016-08-18 10:24:42 -07:00
Werner Saar
412bcd187a
optimized dtrsm_logic_LT_16x4_power8.S and dtrsm_macros_LT_16x4_power8.S
2016-05-23 11:20:41 +02:00
Werner Saar
8b140220c8
optimized dtrsm_kernel_LT for POWER8
2016-05-22 15:20:04 +02:00
Werner Saar
8fb5a1aaff
added optimized dtrsm_LT kernel for POWER8
2016-05-22 13:09:05 +02:00
Werner Saar
6a2bde7a2d
optimized dgemm and dgetrf for POWER8
2016-05-17 14:45:27 +02:00
Werner Saar
8310d4d3f7
optimized dgemm for 20 threads
2016-05-16 14:14:25 +02:00
Werner Saar
56948dbf0f
optimized dgemm for POWER8
2016-04-29 12:52:47 +02:00
Werner Saar
0d0c6f7d7d
optimized dgemm for POWER8
2016-04-27 14:01:08 +02:00
Werner Saar
a3da10662f
added sgemm_tcopy_8_power8.S
2016-04-23 10:04:41 +02:00
Werner Saar
d46f07bb4e
added cgemm_tcopy_8_power8.S
2016-04-23 07:37:18 +02:00
Werner Saar
879a51165f
Optimized zgemm and tested zgemm again
2016-04-22 13:07:12 +02:00
Werner Saar
9276c9012f
Optimized sgemm and dgemm and tested again.
2016-04-21 11:37:57 +02:00
Werner Saar
0001260f4b
optimized sgemm
2016-04-20 13:06:38 +02:00
Werner Saar
3c6294ca3d
added optimized sgemm_tcopy for power8
2016-04-19 16:08:54 +02:00
Werner Saar
e173c51c04
updated zgemm- and ztrmm-kernel for POWER8
2016-04-08 09:05:37 +02:00
Werner Saar
9c42f0374a
Updated cgemm- and sgemm-kernel for POWER8 SMP
2016-04-07 15:08:15 +02:00
Werner Saar
a51102e9b7
bugfixes for sgemm- and cgemm-kernel
2016-04-06 11:15:21 +02:00
Werner Saar
c5b1fbcb2e
updated optimized cgemm- and ctrmm-kernel for POWER8
2016-04-04 09:12:08 +02:00
Werner Saar
d4c0330967
updated cgemm- and ctrmm-kernel for POWER8
2016-04-03 14:30:49 +02:00
Werner Saar
6a9bbfc227
updated sgemm- and strmm-kernel for POWER8
2016-04-02 17:16:36 +02:00
Werner Saar
68a69c5b50
added optimized dgemv_n kernel for POWER8
2016-03-30 11:10:53 +02:00
Werner Saar
c2464a7c4a
added optimized casum kernel for POWER8
2016-03-28 14:12:08 +02:00
Werner Saar
294f933869
added optimized zasum kernel for POWER8
2016-03-28 13:37:32 +02:00
Werner Saar
f59c9bd6ef
added optimized sasum kernel for POWER8
2016-03-28 12:44:25 +02:00