OpenBLAS

History

Rajalakshmi Srinivasaraghavan 7eb55504b1 RFC : Add half precision gemm for bfloat16 in OpenBLAS This patch adds support for bfloat16 data type matrix multiplication kernel. For architectures that don't support bfloat16, it is defined as unsigned short (2 bytes). Default unroll sizes can be changed as per architecture as done for SGEMM and for now 8 and 4 are used for M and N. Size of ncopy/tcopy can be changed as per architecture requirement and for now, size 2 is used. Added shgemm in kernel/power/KERNEL.POWER9 and tested in powerpc64le and powerpc64. For reference, added a small test compare_sgemm_shgemm.c to compare sgemm and shgemm output. This patch does not cover OpenBLAS test, benchmark and lapack tests for shgemm. Complex type implementation can be discussed and added once this is approved.		2020-04-14 14:55:08 -05:00
..
alpha	Add implementations of ssum/dsum and csum/zsum	2019-03-30 22:05:11 +01:00
arm	Make ARMV7 compile with xcode and add a CI job for it (#2537 )	2020-04-02 10:30:37 +02:00
arm64	fix initialization to zero in the NEON SGEMM_BETA kernel as well	2020-03-31 16:53:56 +02:00
generic	RFC : Add half precision gemm for bfloat16 in OpenBLAS	2020-04-14 14:55:08 -05:00
ia64	Add ia64 implementation of ?sum	2019-03-30 22:18:03 +01:00
mips	Add MIPS implementation of ?sum	2019-03-30 22:20:14 +01:00
mips64	Fix compilation problem on loongson platform	2020-04-09 19:28:15 +08:00
power	RFC : Add half precision gemm for bfloat16 in OpenBLAS	2020-04-14 14:55:08 -05:00
sparc	Add SPARC implementation of ?sum	2019-03-30 22:25:06 +01:00
x86	Fix unwanted case-sensitivity in x86 LSAME for (AMD) processors without CMOV	2019-08-13 10:19:10 +02:00
x86_64	Convert aligned moves to unaligned	2020-04-13 14:58:52 +02:00
zarch	add in runtime cpu detection for zarch (#2349 )	2019-12-31 18:03:27 +01:00
CMakeLists.txt	Add missing USE_MIN in kernel/CMakeLists.txt.	2020-02-17 09:01:53 +01:00
Makefile	Add variable for gcc >=9 test	2019-11-29 23:47:23 +01:00
Makefile.L1	Add ?sum	2019-03-30 22:01:13 +01:00
Makefile.L2	Remove all trailing whitespace except lapack-netlib	2014-06-27 12:05:18 -07:00
Makefile.L3	RFC : Add half precision gemm for bfloat16 in OpenBLAS	2020-04-14 14:55:08 -05:00
Makefile.LA	Support NO_LAPACK=1 to build the lib without LAPACK functions.	2011-03-04 11:51:32 +08:00
common_param.h	Fix parameter overflow	2020-04-12 19:47:02 +02:00
setparam-ref.c	RFC : Add half precision gemm for bfloat16 in OpenBLAS	2020-04-14 14:55:08 -05:00