OpenBLAS/kernel
Gordon Fossum bb2f52844b powerpc: Optimized ZGEMM kernel for POWER10
This patch introduces new optimized version of ZGEMM kernel
using power10 Matrix-Multiply Assist (MMA) feature introduced in
POWER ISA v3.1. This patch makes use of new POWER10 compute instructions
for matrix multiplication operation.

Tested on simulator and there are no new test failures.
Cycles count reduced by 30-50%  compared to POWER9 version depending on
M/N/K sizes.
2020-06-24 14:50:12 -05:00
..
alpha Add implementations of ssum/dsum and csum/zsum 2019-03-30 22:05:11 +01:00
arm Make ARMV7 compile with xcode and add a CI job for it (#2537) 2020-04-02 10:30:37 +02:00
arm64 fix INIT8x4 2020-06-10 01:01:16 +08:00
generic Fix DYNAMIC_ARCH compilation errors 2020-04-15 09:09:50 -05:00
ia64 Add ia64 implementation of ?sum 2019-03-30 22:18:03 +01:00
mips Delete KERNEL.1004K 2020-04-19 15:44:30 +02:00
mips64 Fix compilation problem on loongson platform 2020-04-09 19:28:15 +08:00
power powerpc: Optimized ZGEMM kernel for POWER10 2020-06-24 14:50:12 -05:00
sparc Add SPARC implementation of ?sum 2019-03-30 22:25:06 +01:00
x86 Fix unwanted case-sensitivity in x86 LSAME for (AMD) processors without CMOV 2019-08-13 10:19:10 +02:00
x86_64 Merge pull request #2675 from wjc404/develop 2020-06-23 09:29:02 +02:00
zarch s390x: Use new sgemm kernel also for DGEMM and DTRMM on Z14 2020-05-20 10:23:35 +02:00
CMakeLists.txt powerpc: Add support for future processor 2020-06-11 15:47:20 -05:00
Makefile Add variable for gcc >=9 test 2019-11-29 23:47:23 +01:00
Makefile.L1 Add ?sum 2019-03-30 22:01:13 +01:00
Makefile.L2 Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
Makefile.L3 powerpc: Add support for future processor 2020-06-11 15:47:20 -05:00
Makefile.LA Support NO_LAPACK=1 to build the lib without LAPACK functions. 2011-03-04 11:51:32 +08:00
setparam-ref.c Make building the bfloat16 functions conditional on option BUILD_HALF (#2590) 2020-05-01 09:58:30 +02:00