Marius Hillenbrand
43c0d4f312
s390x: Add vectorized sgemm kernel for Z14 and newer
...
Add a new GEMM kernel implementation to exploit the FP32 SIMD
operations introduced with z14 and employ it for SGEMM on z14 and newer
architectures.
The SIMD extensions introduced with z13 support operations on
double-sized scalars in vector registers. Thus, the existing SGEMM code
would extend floats to doubles before operating on them. z14 extended
SIMD support to operations on 32-bit floats. By employing these
instructions, we can operate on twice the number of scalars per
instruction (four floats in each vector registers) and avoid the
conversion operations.
The code is written in C with explicit vectorization. In experiments,
this kernel improves performance on z14 and z15 by around 2x over the
current implementation in assembly. The flexibilty of the C code paves
the way for adjustments in subsequent commits.
Tested via make -C test / ctest / utest and by a couple of additional
unit tests that exercise blocking (e.g., partial register blocks with
fewer than UNROLL_M rows and/or fewer than UNROLL_N columns).
Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com>
2020-05-12 15:59:51 +02:00
int_13h
96ad579428
add in runtime cpu detection for zarch ( #2349 )
...
add in runtime cpu detection for zarch
2019-12-31 18:03:27 +01:00
Andreas Arnez
d117dfd505
Change bad usage of "asum" to "sum" in ZARCH versions of ?sum
...
The ZARCH implementations of ?sum contain a cut & paste-error: An inline
assembly argument is named "sum", but the assembly references "asum"
instead. The mismatch causes a build error. This is fixed.
2019-11-21 13:49:13 +01:00
Martin Kroeker
246ca29679
Add ZARCH implementation of ?sum
...
as trivial copies of the respective ?asum kernels with the ABS and vflpsb calls removed
2019-03-30 22:49:05 +01:00
maamountki
0a54c98b9d
[ZARCH] Modify constraints
2019-02-13 21:06:25 +02:00
maamountki
bec54ae366
[ZARCH] Fix caxpy
2019-02-13 12:54:35 +02:00
maamountki
f583674109
[ZARCH] Fix cgemv_t_4
2019-02-12 13:12:28 +02:00
maamountki
77fe70019f
[ZARCH] Fix constraints and source code formatting
2019-02-11 16:01:13 +02:00
maamountki
7039770165
[ZARCH] Undo the last commit
2019-02-06 20:11:44 +02:00
maamountki
11a43e8116
[ZARCH] Set alignment hint for vl/vst
2019-02-05 19:17:08 +02:00
maamountki
61526480f9
[ZARCH] Fix copy constraint
2019-02-05 07:51:19 +02:00
maamountki
81daf6bc38
[ZARCH] Format source code, Fix constraints
2019-02-05 07:30:38 +02:00
Martin Kroeker
874df65491
Fix incorrect sgemv results for IBM z14
...
part of PR #1993 that was inadvertently misplaced into the toplevel directory
2019-02-01 12:58:59 +01:00
Martin Kroeker
877023e1e1
Fix precision of zarch DSDOT
...
from patch provided by aarnez in #991
2019-01-31 21:22:26 +01:00
Martin Kroeker
265142edd5
Fix typo in the zarch min/max kernels
...
from patch provided by aarnez in #991
2019-01-31 21:21:40 +01:00
maamountki
29416cb5a3
[ZARCH] Add Z13 version for max/min functions
2019-01-31 19:11:11 +02:00
maamountki
48b9b94f7f
[ZARCH] Improve loading performance for camax/icamax
2019-01-31 18:52:11 +02:00
maamountki
fcd814a8d2
[ZARCH] Fix bug in max/min functions
2019-01-29 17:59:38 +02:00
maamountki
dc4d3bccd5
[ZARCH] Fix icamax/icamin
2019-01-29 03:47:49 +02:00
maamountki
c7143c1019
[ZARCH] Fix iamax/imax single precision
2019-01-28 17:52:23 +02:00
maamountki
04873bb174
[ZARCH] Undo the last commit
2019-01-28 17:32:24 +02:00
maamountki
c8ef9fb220
[ZARCH] Fix bug in iamax/iamin/imax/imin
2019-01-28 17:16:18 +02:00
maamountki
b111829226
[ZARCH] Update max/min functions
2019-01-21 15:56:04 +02:00
maamountki
b815a04c87
[ZARCH] fix a bug in max/min functions
2019-01-15 21:04:22 +02:00
maamountki
1a7925b3a3
[ZARCH] Update dgemv_n_4.c
2019-01-11 17:43:11 +02:00
maamountki
406f835f00
[ZARCH] update cgemv_n_4.c
2019-01-11 17:39:17 +02:00
maamountki
621dedb37b
[ZARCH] Update cgemv_t_4.c
2019-01-11 17:37:11 +02:00
maamountki
b731e8246f
Update sgemv_t_4.c
2019-01-11 17:14:04 +02:00
maamountki
ecc31b743f
Update dgemv_t_4.c
2019-01-11 17:13:02 +02:00
maamountki
5d89d6b143
[ZARCH] fix sgemv_n_4.c
2019-01-11 17:08:24 +02:00
maamountki
67432b23c2
[ZARCH] fix cgemv_n_4.c
2019-01-11 16:44:46 +02:00
maamountki
be66f5d5c2
[ZARCH] fix data prefetch type in sdot
2019-01-09 16:50:07 +02:00
maamountki
c2ffef8156
[ZARCH] fix data prefetch type in ddot
2019-01-09 16:49:44 +02:00
maamountki
e7455f500c
[ZARCH] fix dsdot.c
2019-01-09 16:33:54 +02:00
maamountki
3eafcfa650
[ZARCH] fix cgemv_n_4.c
2019-01-09 07:43:45 +02:00
maamountki
94cd946b96
[ZARCH] fix cgemv_n_4.c
2019-01-04 17:45:56 +02:00
maamountki
1aa840a0a2
[ZARCH] fix sgemv_t_4.c
2019-01-04 01:38:18 +02:00
maamountki
e6c0e39492
Optimize Zgemv
2018-08-13 12:23:40 +03:00
maamountki
23229011db
[ZARCH] Z14 support, BLAS 1/2 single precision implementations, Some missing double precision implementations, Gemv optimization
2018-08-06 18:20:40 +03:00
Martin Kroeker
c7b55b6082
Merge pull request #1499 from quickwritereader/develop
...
Implemented missing vsx simd kernels for power8 blas1/2 double. z13 modifications
2018-03-27 21:43:23 +02:00
QWR QWR
28ca97015d
power8:Added initial zgemv_(t|n) ,i(d|z)amax,i(d|z)amin,dgemv_t(transposed),zrot
...
z13: improved zgemv_(t|n)_4,zscal,zaxpy
2018-03-27 14:54:41 +00:00
Martin Kroeker
22167170b3
Merge pull request #1477 from quickwritereader/develop
...
Power8 blas3 copy-pack routines
2018-02-28 18:46:54 +01:00
Martin Kroeker
58f236ad73
Use generic/dot.c for DSDOT on zarch
2018-02-25 19:52:14 +01:00
Martin Kroeker
e207107150
Use generic/dot.c for DSDOT on z13
...
The implementation in arm/dot.c has lower precision, as shown by the utest for dsdot.
2018-02-25 19:51:25 +01:00
the mslm
c5425daa6b
power8 ?gemm_tcopy save/restore
2018-02-16 23:36:46 +00:00
Abdelrauf
60596a1abc
Merge branch 'develop' into develop
2018-01-31 16:17:04 -08:00
Abdelrauf
afd514c25d
small fix inside ifdef z13mvc . (z13mvc code is not used in production)
2018-01-31 18:30:59 -05:00
Martin Kroeker
f45776ec1f
Merge pull request #1440 from quickwritereader/develop
...
small corrections
2018-01-31 23:48:47 +01:00
Abdelrauf
f653e7a18d
small fix
...
small fix inside ifdef z13mvc . (z13mvc code is not used in production)
2018-01-31 07:49:38 -08:00
the mslm
f946a89432
zscal (case: real alpha=0 ) mikrokernel shift&mem fix , da_i as input reg. small typo fixes
2018-01-26 19:25:27 -08:00