s390x/SGEMM: adjust default P and Q to multiples of M
We recently changed the register blocking for SGEMM on s390x to 16x4. However, we did not adjust Q to a multiple of 16 and thus fell back to the 8x4 kernel at each block's margin, without need. Adjust P and Q to multiples of 16 to employ the faster 16x4 kernel for complete full-sized blocks. Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com>
This commit is contained in:
parent
07c334e7be
commit
e115c97e05
4
param.h
4
param.h
|
@ -3092,12 +3092,12 @@ is a big desktop or server with abundant cache rather than a phone or embedded d
|
||||||
#define ZGEMM_DEFAULT_UNROLL_M 4
|
#define ZGEMM_DEFAULT_UNROLL_M 4
|
||||||
#define ZGEMM_DEFAULT_UNROLL_N 4
|
#define ZGEMM_DEFAULT_UNROLL_N 4
|
||||||
|
|
||||||
#define SGEMM_DEFAULT_P 456
|
#define SGEMM_DEFAULT_P 480
|
||||||
#define DGEMM_DEFAULT_P 320
|
#define DGEMM_DEFAULT_P 320
|
||||||
#define CGEMM_DEFAULT_P 480
|
#define CGEMM_DEFAULT_P 480
|
||||||
#define ZGEMM_DEFAULT_P 224
|
#define ZGEMM_DEFAULT_P 224
|
||||||
|
|
||||||
#define SGEMM_DEFAULT_Q 488
|
#define SGEMM_DEFAULT_Q 512
|
||||||
#define DGEMM_DEFAULT_Q 384
|
#define DGEMM_DEFAULT_Q 384
|
||||||
#define CGEMM_DEFAULT_Q 128
|
#define CGEMM_DEFAULT_Q 128
|
||||||
#define ZGEMM_DEFAULT_Q 352
|
#define ZGEMM_DEFAULT_Q 352
|
||||||
|
|
Loading…
Reference in New Issue