OpenBLAS/kernel at e2828e30aa5fc5670d0f4d4d42fc26649a4c3c64 - OpenBLAS - Trustie: Git with trustie

floraachy/OpenBLAS

Files

History

Marius Hillenbrand e2828e30aa s390x: Optimize SGEMM/DGEMM blocks for z14 with explicit loop unrolling/interleaving

Improve performance of SGEMM and DGEMM on z14 and z15 by unrolling and
interleaving the inner loop of the SGEMM 16x4 and DGEMM 8x4 blocks.
Specifically, we explicitly interleave vector register loads and
computation of two iterations.

Note that this change only adds one C function, since SGEMM 16x4 and
DGEMM 8x4 actually map to the same C code: they both hold intermediate
results in a 4x4 grid of vector registers, and the C implementation is
built around that.

Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com>

2020-08-11 12:55:42 +02:00

..

Add implementations of ssum/dsum and csum/zsum

2019-03-30 22:05:11 +01:00

Use OPENBLAS_MAKE_COMPLEX_FLOAT on PPC only

2020-07-23 20:40:13 +00:00

ARM64: Add THUNDERX3T110 Target

2020-07-26 23:32:24 -07:00

powerpc: Optimized SHGEMM kernel for POWER10

2020-06-25 22:19:08 -05:00

Add ia64 implementation of ?sum

2019-03-30 22:18:03 +01:00

Delete KERNEL.1004K

2020-04-19 15:44:30 +02:00

Fix compilation problem on loongson platform

2020-04-09 19:28:15 +08:00

dgemv optimization for POWER10

2020-07-29 18:59:32 -05:00

Add SPARC implementation of ?sum

2019-03-30 22:25:06 +01:00

Fix unwanted case-sensitivity in x86 LSAME for (AMD) processors without CMOV

2019-08-13 10:19:10 +02:00

Multiply by 2 instead of left-shifting a potentially negative number

2020-08-02 18:29:56 +02:00

s390x: Optimize SGEMM/DGEMM blocks for z14 with explicit loop unrolling/interleaving

2020-08-11 12:55:42 +02:00

CMakeLists.txt

powerpc: Add support for future processor

2020-06-11 15:47:20 -05:00

Makefile

Fix compilation issues with clang on POWER

2020-07-27 14:11:07 -05:00

Makefile.L1

Add ?sum

2019-03-30 22:01:13 +01:00

Makefile.L2

Remove all trailing whitespace except lapack-netlib

2014-06-27 12:05:18 -07:00

Makefile.L3

fix trailing whitespace

2020-07-14 18:20:03 +02:00

Makefile.LA

Support NO_LAPACK=1 to build the lib without LAPACK functions.

2011-03-04 11:51:32 +08:00

setparam-ref.c

Make building the bfloat16 functions conditional on option BUILD_HALF (#2590 )

2020-05-01 09:58:30 +02:00