OpenBLAS/kernel
Marius Hillenbrand 2ee5b899ce s390x: enable S/DGEMM block with explicit loop unrolling + interleaving with clang
The code for SGEMM 16x4 and DGEMM 8x4 blocks on z14 and z15 uses
explicit unrolling and interleaving to improve performance. The code
employs an empty inline asm statement with operands that constrain the
compiler's instruction scheduling and thereby enforce proper overlapping
of load and compute phases. Fix an ifdef to apply that for clang builds,
as well.

Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com>
2020-09-02 13:49:31 +02:00
..
alpha Add implementations of ssum/dsum and csum/zsum 2019-03-30 22:05:11 +01:00
arm Use OPENBLAS_MAKE_COMPLEX_FLOAT on PPC only 2020-07-23 20:40:13 +00:00
arm64 ARM64: Add THUNDERX3T110 Target 2020-07-26 23:32:24 -07:00
generic powerpc: Optimized SHGEMM kernel for POWER10 2020-06-25 22:19:08 -05:00
ia64 Add ia64 implementation of ?sum 2019-03-30 22:18:03 +01:00
mips Delete KERNEL.1004K 2020-04-19 15:44:30 +02:00
mips64 Fix compilation problem on loongson platform 2020-04-09 19:28:15 +08:00
power POWER10: Avoid setting accumulators to zero in gemm kernels 2020-08-28 10:42:54 -05:00
sparc Add SPARC implementation of ?sum 2019-03-30 22:25:06 +01:00
x86 Enable COOPERLAKE build target 2020-08-13 06:18:00 +08:00
x86_64 Fix mssing dummy parameter (imag part of alpha) of zdot_thread_function 2020-08-23 15:08:16 +02:00
zarch s390x: enable S/DGEMM block with explicit loop unrolling + interleaving with clang 2020-09-02 13:49:31 +02:00
CMakeLists.txt Merge pull request #2780 from Guobing-Chen/CPL_build_support 2020-08-20 19:54:29 +02:00
Makefile Fix typo 2020-08-19 16:36:55 +02:00
Makefile.L1 Add ?sum 2019-03-30 22:01:13 +01:00
Makefile.L2 Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
Makefile.L3 Merge pull request #2780 from Guobing-Chen/CPL_build_support 2020-08-20 19:54:29 +02:00
Makefile.LA Support NO_LAPACK=1 to build the lib without LAPACK functions. 2011-03-04 11:51:32 +08:00
setparam-ref.c Merge pull request #2780 from Guobing-Chen/CPL_build_support 2020-08-20 19:54:29 +02:00