OpenBLAS/kernel
Rajalakshmi Srinivasaraghavan 7eb55504b1 RFC : Add half precision gemm for bfloat16 in OpenBLAS
This patch adds support for bfloat16 data type matrix multiplication kernel.
For architectures that don't support bfloat16, it is defined as unsigned short
(2 bytes).  Default unroll sizes can be changed as per architecture as done for
SGEMM and for now 8 and 4 are used for M and N.  Size of ncopy/tcopy can be
changed as per architecture requirement and for now, size 2 is used.

Added shgemm in kernel/power/KERNEL.POWER9 and tested in powerpc64le and
powerpc64.  For reference, added a small test compare_sgemm_shgemm.c to compare
sgemm and shgemm output.

This patch does not cover OpenBLAS test, benchmark and lapack tests for shgemm.
Complex type implementation can be discussed and added once this is approved.
2020-04-14 14:55:08 -05:00
..
alpha Add implementations of ssum/dsum and csum/zsum 2019-03-30 22:05:11 +01:00
arm Make ARMV7 compile with xcode and add a CI job for it (#2537) 2020-04-02 10:30:37 +02:00
arm64 fix initialization to zero in the NEON SGEMM_BETA kernel as well 2020-03-31 16:53:56 +02:00
generic RFC : Add half precision gemm for bfloat16 in OpenBLAS 2020-04-14 14:55:08 -05:00
ia64 Add ia64 implementation of ?sum 2019-03-30 22:18:03 +01:00
mips Add MIPS implementation of ?sum 2019-03-30 22:20:14 +01:00
mips64 Fix compilation problem on loongson platform 2020-04-09 19:28:15 +08:00
power RFC : Add half precision gemm for bfloat16 in OpenBLAS 2020-04-14 14:55:08 -05:00
sparc Add SPARC implementation of ?sum 2019-03-30 22:25:06 +01:00
x86 Fix unwanted case-sensitivity in x86 LSAME for (AMD) processors without CMOV 2019-08-13 10:19:10 +02:00
x86_64 Convert aligned moves to unaligned 2020-04-13 14:58:52 +02:00
zarch add in runtime cpu detection for zarch (#2349) 2019-12-31 18:03:27 +01:00
CMakeLists.txt Add missing USE_MIN in kernel/CMakeLists.txt. 2020-02-17 09:01:53 +01:00
Makefile Add variable for gcc >=9 test 2019-11-29 23:47:23 +01:00
Makefile.L1 Add ?sum 2019-03-30 22:01:13 +01:00
Makefile.L2 Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
Makefile.L3 RFC : Add half precision gemm for bfloat16 in OpenBLAS 2020-04-14 14:55:08 -05:00
Makefile.LA Support NO_LAPACK=1 to build the lib without LAPACK functions. 2011-03-04 11:51:32 +08:00
common_param.h Fix parameter overflow 2020-04-12 19:47:02 +02:00
setparam-ref.c RFC : Add half precision gemm for bfloat16 in OpenBLAS 2020-04-14 14:55:08 -05:00