s390x: allow clang to emit fused multiply-adds (replicates gcc's default behavior)

gcc's default setting for floating-point expression contraction is "fast", which allows the compiler to emit fused multiply adds instead of separate multiplies and adds (amongst others). Fused multiply-adds, which assembly kernels typically apply, also bring a significant performance advantage to the C implementation for matrix-matrix multiplication on s390x. To enable that performance advantage for builds with clang, add -ffp-contract=fast to the compiler options. Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com>
2020-09-01 15:09:32 +02:00 · 2020-09-01 15:09:32 +02:00 · 095f4e6964
parent 87e5bbd887
commit 095f4e6964
1 changed files with 6 additions and 0 deletions
--- a/Makefile.zarch
+++ b/Makefile.zarch
@ -8,3 +8,9 @@ ifeq ($(CORE), Z14)
 CCOMMON_OPT += -march=z14 -mzvector -O3
 FCOMMON_OPT += -march=z14 -mzvector
 endif
 # Enable floating-point expression contraction for clang, since it is the
 # default for gcc
 ifeq ($(C_COMPILER), CLANG)
 CCOMMON_OPT += -ffp-contract=fast
 endif