OpenBLAS/driver/level3
Rajalakshmi Srinivasaraghavan 7eb55504b1 RFC : Add half precision gemm for bfloat16 in OpenBLAS
This patch adds support for bfloat16 data type matrix multiplication kernel.
For architectures that don't support bfloat16, it is defined as unsigned short
(2 bytes).  Default unroll sizes can be changed as per architecture as done for
SGEMM and for now 8 and 4 are used for M and N.  Size of ncopy/tcopy can be
changed as per architecture requirement and for now, size 2 is used.

Added shgemm in kernel/power/KERNEL.POWER9 and tested in powerpc64le and
powerpc64.  For reference, added a small test compare_sgemm_shgemm.c to compare
sgemm and shgemm output.

This patch does not cover OpenBLAS test, benchmark and lapack tests for shgemm.
Complex type implementation can be discussed and added once this is approved.
2020-04-14 14:55:08 -05:00
..
CMakeLists.txt Fix threading usage in CMake: s/SMP/USE_THREAD/ 2017-08-19 15:07:42 +10:00
Makefile RFC : Add half precision gemm for bfloat16 in OpenBLAS 2020-04-14 14:55:08 -05:00
gemm.c Import GotoBLAS2 1.13 BSD version codes. 2011-01-24 14:54:24 +00:00
gemm3m.c Import GotoBLAS2 1.13 BSD version codes. 2011-01-24 14:54:24 +00:00
gemm3m_level3.c Update gemm3m_level3.c 2019-12-27 18:03:01 +08:00
gemm_thread_m.c Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
gemm_thread_mn.c Minor C code fixes in driver/ 2015-11-09 14:15:49 +05:30
gemm_thread_n.c Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
gemm_thread_variable.c Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
hemm3m_k.c Import GotoBLAS2 1.13 BSD version codes. 2011-01-24 14:54:24 +00:00
level3.c RFC : Add half precision gemm for bfloat16 in OpenBLAS 2020-04-14 14:55:08 -05:00
level3_gemm3m_thread.c Update level3_gemm3m_thread.c 2020-01-22 17:40:03 +00:00
level3_syr2k.c remove surplus parentheses to silence clang5 2018-01-01 20:56:26 +01:00
level3_syrk.c remove surplus parentheses to silence clang5 2018-01-01 20:56:26 +01:00
level3_syrk_threaded.c Change _STDC_VERSION__ to __STDC_VERSION__ 2018-05-11 12:15:08 +08:00
level3_thread.c RFC : Add half precision gemm for bfloat16 in OpenBLAS 2020-04-14 14:55:08 -05:00
symm3m_k.c Import GotoBLAS2 1.13 BSD version codes. 2011-01-24 14:54:24 +00:00
symm_k.c Import GotoBLAS2 1.13 BSD version codes. 2011-01-24 14:54:24 +00:00
syr2k_k.c Changed a number of inline calls to use __inline. 2015-02-11 11:13:17 -06:00
syr2k_kernel.c Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
syrk_k.c Changed a number of inline calls to use __inline. 2015-02-11 11:13:17 -06:00
syrk_kernel.c prepared driver/level3 functions for UNROLL values, that are not a power of two 2017-01-09 10:38:15 +01:00
syrk_thread.c Avoid taking the root of a negative number 2018-12-22 22:30:29 +01:00
trmm_L.c Update trmm_L.c 2020-02-05 10:09:41 +08:00
trmm_R.c Update trmm_R.c 2020-02-05 10:15:02 +08:00
trsm_L.c Moved declarations to start of functions to satisfy MSVC C89 implementation. 2015-02-11 11:16:57 -06:00
trsm_R.c Moved declarations to start of functions to satisfy MSVC C89 implementation. 2015-02-11 11:16:57 -06:00
zhemm_k.c Import GotoBLAS2 1.13 BSD version codes. 2011-01-24 14:54:24 +00:00
zher2k_k.c Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
zher2k_kernel.c prepared driver/level3 functions for UNROLL values, that are not a power of two 2017-01-09 10:38:15 +01:00
zherk_beta.c Import GotoBLAS2 1.13 BSD version codes. 2011-01-24 14:54:24 +00:00
zherk_k.c Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
zherk_kernel.c prepared driver/level3 functions for UNROLL values, that are not a power of two 2017-01-09 10:38:15 +01:00
zsyrk_beta.c Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00