OpenBLAS

Commit Graph

Author	SHA1	Message	Date
Rajalakshmi Srinivasaraghavan	7eb55504b1	RFC : Add half precision gemm for bfloat16 in OpenBLAS This patch adds support for bfloat16 data type matrix multiplication kernel. For architectures that don't support bfloat16, it is defined as unsigned short (2 bytes). Default unroll sizes can be changed as per architecture as done for SGEMM and for now 8 and 4 are used for M and N. Size of ncopy/tcopy can be changed as per architecture requirement and for now, size 2 is used. Added shgemm in kernel/power/KERNEL.POWER9 and tested in powerpc64le and powerpc64. For reference, added a small test compare_sgemm_shgemm.c to compare sgemm and shgemm output. This patch does not cover OpenBLAS test, benchmark and lapack tests for shgemm. Complex type implementation can be discussed and added once this is approved.	2020-04-14 14:55:08 -05:00
Zhiyong Dang	3716267124	Change _STDC_VERSION__ to __STDC_VERSION__ Change-Id: Id3fa4e8d9eedd4ef7230df69b611e7f397301a42	2018-05-11 12:15:08 +08:00
Martin Kroeker	40160ff3c1	Use _Atomic instead of volatile for thread safety where C11 is supported	2018-03-10 00:15:44 +01:00

Author

SHA1

Message

Date

Rajalakshmi Srinivasaraghavan

7eb55504b1

RFC : Add half precision gemm for bfloat16 in OpenBLAS

This patch adds support for bfloat16 data type matrix multiplication kernel.
For architectures that don't support bfloat16, it is defined as unsigned short
(2 bytes).  Default unroll sizes can be changed as per architecture as done for
SGEMM and for now 8 and 4 are used for M and N.  Size of ncopy/tcopy can be
changed as per architecture requirement and for now, size 2 is used.

Added shgemm in kernel/power/KERNEL.POWER9 and tested in powerpc64le and
powerpc64.  For reference, added a small test compare_sgemm_shgemm.c to compare
sgemm and shgemm output.

This patch does not cover OpenBLAS test, benchmark and lapack tests for shgemm.
Complex type implementation can be discussed and added once this is approved.

2020-04-14 14:55:08 -05:00

Zhiyong Dang

3716267124

Change _STDC_VERSION__ to __STDC_VERSION__

Change-Id: Id3fa4e8d9eedd4ef7230df69b611e7f397301a42

2018-05-11 12:15:08 +08:00

Martin Kroeker

40160ff3c1

Use _Atomic instead of volatile for thread safety where C11 is supported

2018-03-10 00:15:44 +01:00

3 Commits