s390x: enable S/DGEMM block with explicit loop unrolling + interleaving with clang

The code for SGEMM 16x4 and DGEMM 8x4 blocks on z14 and z15 uses
explicit unrolling and interleaving to improve performance. The code
employs an empty inline asm statement with operands that constrain the
compiler's instruction scheduling and thereby enforce proper overlapping
of load and compute phases. Fix an ifdef to apply that for clang builds,
as well.

Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com>
This commit is contained in:
Marius Hillenbrand 2020-09-01 16:16:53 +02:00
parent 095f4e6964
commit 2ee5b899ce
1 changed files with 1 additions and 1 deletions

View File

@ -393,7 +393,7 @@ static inline void GEBP_block_16_4(
* Note that we need to massage this particular "barrier" * Note that we need to massage this particular "barrier"
* depending on the gcc version. * depending on the gcc version.
*/ */
#if __GNUC__ > 7 #if __GNUC__ > 7 || defined(__clang__)
#define BARRIER_READ_BEFORE_COMPUTE(SUFFIX) \ #define BARRIER_READ_BEFORE_COMPUTE(SUFFIX) \
do { \ do { \
asm("" \ asm("" \