OpenBLAS/kernel
Marius Hillenbrand 1b0b4349a1 s390x/Z14: Change register blocking for SGEMM to 16x4
Change register blocking for SGEMM (and STRMM) on z14 from 8x4 to 16x4
by adjusting SGEMM_DEFAULT_UNROLL_M and choosing the appropriate copy
implementations. Actually make KERNEL.Z14 more flexible, so that the
change in param.h suffices. As a result, performance for SGEMM improves
by around 30% on z15.

On z14, FP SIMD instructions can operate on float-sized scalars in
vector registers, while z13 could do that for double-sized scalars only.
Thus, we can double the amount of elements of C that are held in
registers in an SGEMM kernel.

Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com>
2020-05-12 15:59:51 +02:00
..
alpha Add implementations of ssum/dsum and csum/zsum 2019-03-30 22:05:11 +01:00
arm Make ARMV7 compile with xcode and add a CI job for it (#2537) 2020-04-02 10:30:37 +02:00
arm64 ARM64: Improve DAXPY for ThunderX2 2020-05-07 09:22:50 -07:00
generic Fix DYNAMIC_ARCH compilation errors 2020-04-15 09:09:50 -05:00
ia64 Add ia64 implementation of ?sum 2019-03-30 22:18:03 +01:00
mips Delete KERNEL.1004K 2020-04-19 15:44:30 +02:00
mips64 Fix compilation problem on loongson platform 2020-04-09 19:28:15 +08:00
power Fix cmake compilation issue - POWER9 2020-05-08 20:31:56 -05:00
sparc Add SPARC implementation of ?sum 2019-03-30 22:25:06 +01:00
x86 Fix unwanted case-sensitivity in x86 LSAME for (AMD) processors without CMOV 2019-08-13 10:19:10 +02:00
x86_64 Work around excessive LAPACK test failures on Skylake-X 2020-05-09 23:49:18 +02:00
zarch s390x/Z14: Change register blocking for SGEMM to 16x4 2020-05-12 15:59:51 +02:00
CMakeLists.txt Make building the bfloat16 functions conditional on option BUILD_HALF (#2590) 2020-05-01 09:58:30 +02:00
Makefile Add variable for gcc >=9 test 2019-11-29 23:47:23 +01:00
Makefile.L1 Add ?sum 2019-03-30 22:01:13 +01:00
Makefile.L2 Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
Makefile.L3 Make building the bfloat16 functions conditional on option BUILD_HALF (#2590) 2020-05-01 09:58:30 +02:00
Makefile.LA Support NO_LAPACK=1 to build the lib without LAPACK functions. 2011-03-04 11:51:32 +08:00
common_param.h Fix warnings in clang and export symbol 2020-04-15 19:15:23 -05:00
setparam-ref.c Make building the bfloat16 functions conditional on option BUILD_HALF (#2590) 2020-05-01 09:58:30 +02:00