Files
OpenBLAS/kernel
Marius Hillenbrand e2828e30aa s390x: Optimize SGEMM/DGEMM blocks for z14 with explicit loop unrolling/interleaving
Improve performance of SGEMM and DGEMM on z14 and z15 by unrolling and
interleaving the inner loop of the SGEMM 16x4 and DGEMM 8x4 blocks.
Specifically, we explicitly interleave vector register loads and
computation of two iterations.

Note that this change only adds one C function, since SGEMM 16x4 and
DGEMM 8x4 actually map to the same C code: they both hold intermediate
results in a 4x4 grid of vector registers, and the C implementation is
built around that.

Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com>
2020-08-11 12:55:42 +02:00
..
2020-07-26 23:32:24 -07:00
2019-03-30 22:18:03 +01:00
2020-04-19 15:44:30 +02:00
2020-07-29 18:59:32 -05:00
2019-03-30 22:25:06 +01:00
2019-03-30 22:01:13 +01:00
2020-07-14 18:20:03 +02:00