Files
OpenBLAS/kernel/zarch
Marius Hillenbrand 1b0b4349a1 s390x/Z14: Change register blocking for SGEMM to 16x4
Change register blocking for SGEMM (and STRMM) on z14 from 8x4 to 16x4
by adjusting SGEMM_DEFAULT_UNROLL_M and choosing the appropriate copy
implementations. Actually make KERNEL.Z14 more flexible, so that the
change in param.h suffices. As a result, performance for SGEMM improves
by around 30% on z15.

On z14, FP SIMD instructions can operate on float-sized scalars in
vector registers, while z13 could do that for double-sized scalars only.
Thus, we can double the amount of elements of C that are held in
registers in an SGEMM kernel.

Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com>
2020-05-12 15:59:51 +02:00
..
2019-02-13 12:54:35 +02:00
2019-02-13 21:06:25 +02:00
2019-02-12 13:12:28 +02:00
2017-03-13 01:23:16 +04:00
2017-03-13 01:23:16 +04:00
2016-04-15 18:02:24 -04:00
2017-02-26 06:14:12 +04:00
2016-04-15 18:02:24 -04:00
2017-03-13 01:23:16 +04:00
2017-03-13 01:23:16 +04:00
2019-02-13 21:06:25 +02:00
2017-02-26 06:14:12 +04:00