OpenBLAS/kernel
Marius Hillenbrand 43c0d4f312 s390x: Add vectorized sgemm kernel for Z14 and newer
Add a new GEMM kernel implementation to exploit the FP32 SIMD
operations introduced with z14 and employ it for SGEMM on z14 and newer
architectures.

The SIMD extensions introduced with z13 support operations on
double-sized scalars in vector registers. Thus, the existing SGEMM code
would extend floats to doubles before operating on them. z14 extended
SIMD support to operations on 32-bit floats. By employing these
instructions, we can operate on twice the number of scalars per
instruction (four floats in each vector registers) and avoid the
conversion operations.

The code is written in C with explicit vectorization. In experiments,
this kernel improves performance on z14 and z15 by around 2x over the
current implementation in assembly. The flexibilty of the C code paves
the way for adjustments in subsequent commits.

Tested via make -C test / ctest / utest and by a couple of additional
unit tests that exercise blocking (e.g., partial register blocks with
fewer than UNROLL_M rows and/or fewer than UNROLL_N columns).

Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com>
2020-05-12 15:59:51 +02:00
..
alpha Add implementations of ssum/dsum and csum/zsum 2019-03-30 22:05:11 +01:00
arm Make ARMV7 compile with xcode and add a CI job for it (#2537) 2020-04-02 10:30:37 +02:00
arm64 ARM64: Improve DAXPY for ThunderX2 2020-05-07 09:22:50 -07:00
generic Fix DYNAMIC_ARCH compilation errors 2020-04-15 09:09:50 -05:00
ia64 Add ia64 implementation of ?sum 2019-03-30 22:18:03 +01:00
mips Delete KERNEL.1004K 2020-04-19 15:44:30 +02:00
mips64 Fix compilation problem on loongson platform 2020-04-09 19:28:15 +08:00
power Fix cmake compilation issue - POWER9 2020-05-08 20:31:56 -05:00
sparc Add SPARC implementation of ?sum 2019-03-30 22:25:06 +01:00
x86 Fix unwanted case-sensitivity in x86 LSAME for (AMD) processors without CMOV 2019-08-13 10:19:10 +02:00
x86_64 Work around excessive LAPACK test failures on Skylake-X 2020-05-09 23:49:18 +02:00
zarch s390x: Add vectorized sgemm kernel for Z14 and newer 2020-05-12 15:59:51 +02:00
CMakeLists.txt Make building the bfloat16 functions conditional on option BUILD_HALF (#2590) 2020-05-01 09:58:30 +02:00
Makefile Add variable for gcc >=9 test 2019-11-29 23:47:23 +01:00
Makefile.L1 Add ?sum 2019-03-30 22:01:13 +01:00
Makefile.L2 Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
Makefile.L3 Make building the bfloat16 functions conditional on option BUILD_HALF (#2590) 2020-05-01 09:58:30 +02:00
Makefile.LA Support NO_LAPACK=1 to build the lib without LAPACK functions. 2011-03-04 11:51:32 +08:00
common_param.h Fix warnings in clang and export symbol 2020-04-15 19:15:23 -05:00
setparam-ref.c Make building the bfloat16 functions conditional on option BUILD_HALF (#2590) 2020-05-01 09:58:30 +02:00