s390x/SGEMM: adjust default P and Q to multiples of M

We recently changed the register blocking for SGEMM on s390x to 16x4.
However, we did not adjust Q to a multiple of 16 and thus fell back to
the 8x4 kernel at each block's margin, without need. Adjust P and Q to
multiples of 16 to employ the faster 16x4 kernel for complete full-sized
blocks.

Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com>
This commit is contained in:
Marius Hillenbrand 2020-08-11 12:55:59 +02:00
parent 07c334e7be
commit e115c97e05
1 changed files with 2 additions and 2 deletions

View File

@ -3092,12 +3092,12 @@ is a big desktop or server with abundant cache rather than a phone or embedded d
#define ZGEMM_DEFAULT_UNROLL_M 4
#define ZGEMM_DEFAULT_UNROLL_N 4
#define SGEMM_DEFAULT_P 456
#define SGEMM_DEFAULT_P 480
#define DGEMM_DEFAULT_P 320
#define CGEMM_DEFAULT_P 480
#define ZGEMM_DEFAULT_P 224
#define SGEMM_DEFAULT_Q 488
#define SGEMM_DEFAULT_Q 512
#define DGEMM_DEFAULT_Q 384
#define CGEMM_DEFAULT_Q 128
#define ZGEMM_DEFAULT_Q 352