AbdelRauf
628b335e83
Merge branch 'develop' of https://github.com/quickwritereader/OpenBLAS into develop
2019-04-29 08:57:44 +00:00
AbdelRauf
0f105dd8a5
sgemm/strmm
2019-04-29 08:49:50 +00:00
Martin Kroeker
ccfb7ead15
Merge pull request #2072 from martin-frbg/sum
...
Add (C)BLAS extension ?sum
2019-04-23 20:11:36 +02:00
Rashmica Gupta
bcdf1d4917
Add in runtime CPU detection for POWER.
2019-04-09 14:20:16 +10:00
Martin Kroeker
706dfe263b
Add POWER implementation of ?sum
...
as trivial copy of ?asum with the fabs replaced by fmr to preserve code structure
2019-03-30 22:23:42 +01:00
Martin Kroeker
7c51cc8527
Merge branch 'develop' into develop
2019-03-29 19:36:29 +01:00
AbdelRauf
853a18bc17
power9 makefile. dgemm based on power8 kernel with following changes : 32x unrolled 16x4 kernel and 8x4 kernel using (lxv stxv butterfly rank1 update). improvement from 17 to 22-23gflops. dtrmm cases were added into dgemm itself
2019-03-29 15:49:40 +00:00
Martin Kroeker
718efcec6f
Fix out-of-bounds memory access in gemm_beta
...
Fixes #2011 (as suggested by davemq), assuming typo by K.Goto
2019-02-13 22:08:37 +01:00
Martin Kroeker
f9d67bb5e8
Fix out-of-bounds memory access in gemm_beta
...
Fixes #2011 (as suggested by davemq) presuming typo by K.Goto
2019-02-13 22:06:41 +01:00
Ubuntu
498ac98581
Note for unused kernels
2019-02-04 15:41:56 +00:00
Ubuntu
cd9ea45463
NBMAX=4096 for gemvn, added sgemvn 8x8 for future
2019-02-04 06:57:11 +00:00
Ubuntu
4abc375a91
sgemv cgemv pairs
2019-02-01 13:45:00 +00:00
Ubuntu
43a4572038
crot fix
2019-01-17 14:45:31 +00:00
Abdelrauf
a034e65512
Merge branch 'develop' into develop
2019-01-16 19:25:13 +04:00
Ubuntu
8c3386be87
Added missing Blas1 single fp {saxpy, caxpy, cdot, crot(refactored version of srot),isamax ,isamin, icamax, icamin},
...
Fixed idamin,icamin choosing the first occurance index of equal minimals
2019-01-16 15:16:21 +00:00
Martin Kroeker
961d25e9c7
Use the new zrot.c on POWER8 for crot as well
...
fixes #1571 (the old zrot.S assembly does not handle incx=0 correctly)
2018-05-23 22:54:39 +02:00
Martin Kroeker
8a3b6fa108
Use generic zrot.c on ppc64/POWER6 to work around utest failure from … ( #1535 )
...
* Use generic C implementation of zrot on ppc64/POWER6 to work around utest failure from #1469
2018-04-23 19:05:49 +02:00
QWR QWR
28ca97015d
power8:Added initial zgemv_(t|n) ,i(d|z)amax,i(d|z)amin,dgemv_t(transposed),zrot
...
z13: improved zgemv_(t|n)_4,zscal,zaxpy
2018-03-27 14:54:41 +00:00
the mslm
2c0a008281
dgemm_ncopy_4_ save/restore
2018-02-18 01:30:17 +00:00
the mslm
c5425daa6b
power8 ?gemm_tcopy save/restore
2018-02-16 23:36:46 +00:00
martin
7a4b3cfbf8
Add trivially optimized DSDOT for POWER8
2017-11-28 18:38:07 +01:00
Martin Kroeker
9c017a2218
Save and restore VSX registers
2017-09-28 12:17:09 +02:00
Matt Brown
bd831a03a8
Optimise sscal for POWER9
...
Use lxvd2x instruction instead of lxvw4x.
lxvd2x performs far better on the new POWER architecture than lxvw4x.
2017-06-14 17:02:46 +10:00
Matt Brown
edc97918f8
Optimise srot for POWER9
...
Use lxvd2x instruction instead of lxvw4x.
lxvd2x performs far better on the new POWER architecture than lxvw4x.
2017-06-14 17:02:35 +10:00
Matt Brown
e0034de22d
Optimise sdot for POWER9
...
Use lxvd2x instruction instead of lxvw4x.
lxvd2x performs far better on the new POWER architecture than lxvw4x.
2017-06-14 17:02:19 +10:00
Matt Brown
32c7fe6bff
Optimise sasum for POWER9
...
Use lxvd2x instruction instead of lxvw4x.
lxvd2x performs far better on the new POWER architecture than lxvw4x.
2017-06-14 17:02:10 +10:00
Matt Brown
19bdf9d52b
Optimise casum for POWER9
...
Use lxvd2x instruction instead of lxvw4x.
lxvd2x performs far better on the new POWER architecture than lxvw4x.
2017-06-14 17:00:07 +10:00
Matt Brown
4f09030fdc
Optimise cswap for POWER9
...
Use lxvd2x instruction instead of lxvw4x.
lxvd2x performs far better on the new POWER architecture than lxvw4x.
2017-06-14 16:59:53 +10:00
Matt Brown
6f4eca5ea4
Optimise sswap for POWER9
...
Use lxvd2x instruction instead of lxvw4x.
lxvd2x performs far better on the new POWER architecture than lxvw4x.
2017-06-14 16:59:13 +10:00
Matt Brown
be55f96cbd
Optimise scopy for POWER9
...
Use lxvd2x instruction instead of lxvw4x.
lxvd2x performs far better on the new POWER architecture than lxvw4x.
2017-06-14 16:59:13 +10:00
Matt Brown
96dd0ef4f7
Optimise ccopy for POWER9
...
Use lxvd2x instruction instead of lxvw4x.
lxvd2x performs far better on the new POWER architecture than lxvw4x.
2017-06-14 16:58:59 +10:00
Alan Modra
dc40bc7368
Power8 inline assembly tweaks
...
Further fixes on top of 9e2f316ed
. Writing some doco for gcc on
inline assembly woke me up to some more errors.
- dgemv_kernel_4x4 asm did not mention *ap as a memory input, and
*y is both read and write.
- sasum_kernel_32 and casum_kernel_16 did not use %x for a vsx insn
operand, a problem if the "=f" sum output was ever allocated a vsx
reg in the altivec set. This might be possible with inlining and
future gcc optimisation.
2017-04-04 23:13:54 +09:30
Martin Kroeker
9e2f316ede
Power8 inline assembly fixes
...
Quoting patch author amodra from #1078
Lots of issues here.
- The vsx regs weren't listed as clobbered.
- Poor choice of vsx regs, which along with the lack of clobbers led to
trashing v0..v21 and fr14..fr23. Ideally you'd let gcc choose all
temp vsx regs, but asms currently have a limit of 30 i/o parms.
- Other regs were clobbered unnecessarily, seemingly in an attempt to
clobber inputs, with gcc-7 complaining about the clobber of r2.
(Changed inputs should be also listed as outputs or as an i/o.)
- "r" constraint used instead of "b" for gprs used in insns where the
r0 encoding means zero rather than r0.
- There were unused asm inputs too.
- All memory was clobbered rather than hooking up memory outputs with
proper memory constraints, and that and the lack of proper memory
input constraints meant the asms needed to be volatile and their
containing function noinline.
- Some parameters were being passed unnecessarily via memory.
- When a copy of a
2017-02-13 23:38:50 +01:00
Martin Kroeker
a6e9e0b94b
Remove explicit include of complex.h
2016-09-29 23:43:28 +02:00
Zhang Xianyi
515bc56ea9
Refs #946 . Use nrm2 reference implementation for Power8.
2016-08-18 18:59:43 -07:00
Zhang Xianyi
ae70b916f4
Refs #929 . Deal with zero and NaNs for scale.
2016-08-18 10:24:42 -07:00
Werner Saar
412bcd187a
optimized dtrsm_logic_LT_16x4_power8.S and dtrsm_macros_LT_16x4_power8.S
2016-05-23 11:20:41 +02:00
Werner Saar
8b140220c8
optimized dtrsm_kernel_LT for POWER8
2016-05-22 15:20:04 +02:00
Werner Saar
8fb5a1aaff
added optimized dtrsm_LT kernel for POWER8
2016-05-22 13:09:05 +02:00
Werner Saar
6a2bde7a2d
optimized dgemm and dgetrf for POWER8
2016-05-17 14:45:27 +02:00
Werner Saar
8310d4d3f7
optimized dgemm for 20 threads
2016-05-16 14:14:25 +02:00
Werner Saar
56948dbf0f
optimized dgemm for POWER8
2016-04-29 12:52:47 +02:00
Werner Saar
0d0c6f7d7d
optimized dgemm for POWER8
2016-04-27 14:01:08 +02:00
Werner Saar
a3da10662f
added sgemm_tcopy_8_power8.S
2016-04-23 10:04:41 +02:00
Werner Saar
d46f07bb4e
added cgemm_tcopy_8_power8.S
2016-04-23 07:37:18 +02:00
Werner Saar
879a51165f
Optimized zgemm and tested zgemm again
2016-04-22 13:07:12 +02:00
Werner Saar
9276c9012f
Optimized sgemm and dgemm and tested again.
2016-04-21 11:37:57 +02:00
Werner Saar
0001260f4b
optimized sgemm
2016-04-20 13:06:38 +02:00
Werner Saar
3c6294ca3d
added optimized sgemm_tcopy for power8
2016-04-19 16:08:54 +02:00
Werner Saar
e173c51c04
updated zgemm- and ztrmm-kernel for POWER8
2016-04-08 09:05:37 +02:00
Werner Saar
9c42f0374a
Updated cgemm- and sgemm-kernel for POWER8 SMP
2016-04-07 15:08:15 +02:00
Werner Saar
a51102e9b7
bugfixes for sgemm- and cgemm-kernel
2016-04-06 11:15:21 +02:00
Werner Saar
c5b1fbcb2e
updated optimized cgemm- and ctrmm-kernel for POWER8
2016-04-04 09:12:08 +02:00
Werner Saar
d4c0330967
updated cgemm- and ctrmm-kernel for POWER8
2016-04-03 14:30:49 +02:00
Werner Saar
6a9bbfc227
updated sgemm- and strmm-kernel for POWER8
2016-04-02 17:16:36 +02:00
Werner Saar
68a69c5b50
added optimized dgemv_n kernel for POWER8
2016-03-30 11:10:53 +02:00
Werner Saar
c2464a7c4a
added optimized casum kernel for POWER8
2016-03-28 14:12:08 +02:00
Werner Saar
294f933869
added optimized zasum kernel for POWER8
2016-03-28 13:37:32 +02:00
Werner Saar
f59c9bd6ef
added optimized sasum kernel for POWER8
2016-03-28 12:44:25 +02:00
Werner Saar
c53be46d78
added optimized dasum kernel for POWER8
2016-03-28 12:17:15 +02:00
Werner Saar
659ed16591
added otimized cswap and zswap kernels for POWER8
2016-03-27 18:31:37 +02:00
Werner Saar
35c98a3556
added optimized zscal kernel for POWER8
2016-03-27 16:31:50 +02:00
Werner Saar
f1a5dd06c5
added optimized sscal kernel for POWER8
2016-03-27 11:05:56 +02:00
Werner Saar
35f1f21a7f
added drot- and srot-kernel optimimized for POWER8
2016-03-27 08:57:11 +02:00
Werner Saar
3d9a50e841
added optimized sswap kernel for POWER8
2016-03-25 17:34:55 +01:00
Werner Saar
828c849b44
added optimized ccopy kernel for POWER8
2016-03-25 16:54:25 +01:00
Werner Saar
ecc0bc9813
added optimized scopy kernel for POWER8
2016-03-25 16:06:56 +01:00
Werner Saar
12f209b7b0
added optimized zswap kernel for POWER8
2016-03-25 15:27:34 +01:00
Werner Saar
7316a87930
added optimized dswap kernel for POWER8
2016-03-25 14:35:43 +01:00
Werner Saar
0bff057a87
added optimized dcopy kernel for POWER8
2016-03-25 13:03:02 +01:00
Werner Saar
1e6cf9808c
added optimized dscal kernel for POWER8
2016-03-25 09:42:08 +01:00
Werner Saar
55eda3813b
added optimized zaxpy kernel for POWER8
2016-03-23 11:20:23 +01:00
Werner Saar
0664ba4c97
added optimized daxpy kernel for POWER8
2016-03-22 14:50:03 +01:00
Werner Saar
11c44dede1
added optimized sdot kernel for POWER8
2016-03-21 13:18:23 +01:00
Werner Saar
9e4584d069
added optimized zdot kernel for POWER8
2016-03-21 10:12:07 +01:00
Werner Saar
cd9fafc054
ddot for POWER8: updated licence information
2016-03-20 11:19:27 +01:00
Werner Saar
84b92e6373
added optimized ddot kernel for POWER8
2016-03-20 11:06:06 +01:00
Werner Saar
e1df5a6e23
fixed sgemm- and strmm-kernel
2016-03-18 12:12:03 +01:00
Werner Saar
5c658f8746
add optimized cgemm- and ctrmm-kernel for POWER8
2016-03-18 08:17:25 +01:00
Werner Saar
dcd15b546c
BUGFIX: KERNEL.POWER8
2016-03-14 14:36:59 +01:00
Werner Saar
96284ab295
added sgemm- and strmm-kernel for POWER8
2016-03-14 13:52:44 +01:00
Werner Saar
cd5241d0cf
modified KERNEL for power, to use the generic DSDOT-KERNEL
2016-03-06 09:07:24 +01:00
Werner Saar
085f215257
Modified assembly label name, so that they are hidden.
...
Added license informations.
2016-03-05 10:27:27 +01:00
Werner Saar
0afc76fd65
enabled gemm_beta assembly kernels
2016-03-04 15:01:15 +01:00
Werner Saar
91e1c5080c
modified configuration, to use power6 sgemm kernel for power8
2016-03-04 13:38:57 +01:00
Werner Saar
73f04c2c72
enabled hemv assemly function for power8
2016-03-04 13:20:50 +01:00
Werner Saar
3e633152c6
enabled symv assembly kernels on power8
2016-03-04 13:08:18 +01:00
Werner Saar
d5130ce7e3
enabled gemv assembly on power8
2016-03-04 12:53:31 +01:00
Werner Saar
4824b88fcb
enabled all level1 assembly kernels for power8
2016-03-04 12:35:25 +01:00
Werner Saar
b752858d6c
added dgemm-, dtrmm-, zgemm- and ztrmm-kernel for power8
2016-03-01 07:33:56 +01:00
Zhang Xianyi
3e8d6ea74f
Init POWER8 kernels by POWER6.
2015-11-03 12:34:23 +08:00
Matthew Brandyberry
7ba4fe5afb
ppc64le platform support (ELF ABI v2)
2015-07-21 22:20:19 -05:00
Timothy Gu
6c2ead30f0
Remove all trailing whitespace except lapack-netlib
...
Signed-off-by: Timothy Gu <timothygu99@gmail.com>
2014-06-27 12:05:18 -07:00
Xianyi Zhang
342bbc3871
Import GotoBLAS2 1.13 BSD version codes.
2011-01-24 14:54:24 +00:00