gcc's default setting for floating-point expression contraction is "fast", which allows the compiler to emit fused multiply adds instead of separate multiplies and adds (amongst others). Fused multiply-adds, which assembly kernels typically apply, also bring a significant performance advantage to the C implementation for matrix-matrix multiplication on s390x. To enable that performance advantage for builds with clang, add -ffp-contract=fast to the compiler options. Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com>
17 lines
363 B
Makefile
17 lines
363 B
Makefile
|
|
ifeq ($(CORE), Z13)
|
|
CCOMMON_OPT += -march=z13 -mzvector
|
|
FCOMMON_OPT += -march=z13 -mzvector
|
|
endif
|
|
|
|
ifeq ($(CORE), Z14)
|
|
CCOMMON_OPT += -march=z14 -mzvector -O3
|
|
FCOMMON_OPT += -march=z14 -mzvector
|
|
endif
|
|
|
|
# Enable floating-point expression contraction for clang, since it is the
|
|
# default for gcc
|
|
ifeq ($(C_COMPILER), CLANG)
|
|
CCOMMON_OPT += -ffp-contract=fast
|
|
endif
|