OpenBLAS

Author	SHA1	Message	Date
Rajalakshmi Srinivasaraghavan	7eb55504b1	RFC : Add half precision gemm for bfloat16 in OpenBLAS This patch adds support for bfloat16 data type matrix multiplication kernel. For architectures that don't support bfloat16, it is defined as unsigned short (2 bytes). Default unroll sizes can be changed as per architecture as done for SGEMM and for now 8 and 4 are used for M and N. Size of ncopy/tcopy can be changed as per architecture requirement and for now, size 2 is used. Added shgemm in kernel/power/KERNEL.POWER9 and tested in powerpc64le and powerpc64. For reference, added a small test compare_sgemm_shgemm.c to compare sgemm and shgemm output. This patch does not cover OpenBLAS test, benchmark and lapack tests for shgemm. Complex type implementation can be discussed and added once this is approved.	2020-04-14 14:55:08 -05:00
Timothy Gu	6c2ead30f0	Remove all trailing whitespace except lapack-netlib Signed-off-by: Timothy Gu <timothygu99@gmail.com>	2014-06-27 12:05:18 -07:00
Zhang Xianyi	6d8095bcb9	Avoid argument list too long issue in make clean.	2013-11-07 13:06:42 +08:00
Jameson Nash	d0e731e8b8	provide support for passing CFLAGS, FFLAGS, PFLAGS, FPFLAGS to make on the command line	2012-08-21 00:31:12 -04:00
Xianyi Zhang	342bbc3871	Import GotoBLAS2 1.13 BSD version codes.	2011-01-24 14:54:24 +00:00