From 2ee5b899ce9777c63710de1ede75c362db5bcd47 Mon Sep 17 00:00:00 2001 From: Marius Hillenbrand Date: Tue, 1 Sep 2020 16:16:53 +0200 Subject: [PATCH] s390x: enable S/DGEMM block with explicit loop unrolling + interleaving with clang The code for SGEMM 16x4 and DGEMM 8x4 blocks on z14 and z15 uses explicit unrolling and interleaving to improve performance. The code employs an empty inline asm statement with operands that constrain the compiler's instruction scheduling and thereby enforce proper overlapping of load and compute phases. Fix an ifdef to apply that for clang builds, as well. Signed-off-by: Marius Hillenbrand --- kernel/zarch/gemm_vec.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/zarch/gemm_vec.c b/kernel/zarch/gemm_vec.c index b7d7cc04b..ef0b1d1e3 100644 --- a/kernel/zarch/gemm_vec.c +++ b/kernel/zarch/gemm_vec.c @@ -393,7 +393,7 @@ static inline void GEBP_block_16_4( * Note that we need to massage this particular "barrier" * depending on the gcc version. */ -#if __GNUC__ > 7 +#if __GNUC__ > 7 || defined(__clang__) #define BARRIER_READ_BEFORE_COMPUTE(SUFFIX) \ do { \ asm("" \