OpenBLAS/driver/others
Rajalakshmi Srinivasaraghavan 7eb55504b1 RFC : Add half precision gemm for bfloat16 in OpenBLAS
This patch adds support for bfloat16 data type matrix multiplication kernel.
For architectures that don't support bfloat16, it is defined as unsigned short
(2 bytes).  Default unroll sizes can be changed as per architecture as done for
SGEMM and for now 8 and 4 are used for M and N.  Size of ncopy/tcopy can be
changed as per architecture requirement and for now, size 2 is used.

Added shgemm in kernel/power/KERNEL.POWER9 and tested in powerpc64le and
powerpc64.  For reference, added a small test compare_sgemm_shgemm.c to compare
sgemm and shgemm output.

This patch does not cover OpenBLAS test, benchmark and lapack tests for shgemm.
Complex type implementation can be discussed and added once this is approved.
2020-04-14 14:55:08 -05:00
..
CMakeLists.txt Fix typo in previous commit for arm dynamic arch 2018-12-07 19:37:33 +01:00
Makefile add in runtime cpu detection for zarch (#2349) 2019-12-31 18:03:27 +01:00
abs.c Import GotoBLAS2 1.13 BSD version codes. 2011-01-24 14:54:24 +00:00
blas_l1_thread.c modify the blas_l1_thread.c for support multi-threded for L1 fuction with return value 2017-01-10 11:47:06 +08:00
blas_server.c Add API to set thread affinity on Linux. 2020-04-08 12:49:35 -07:00
blas_server_omp.c Add build-time option for OMP scheduler; document MULTITHREAD_THRESHOLD range (#1620) 2018-06-15 11:25:05 +02:00
blas_server_win32.c driver: more reasonable thread wait timeout on Windows. 2019-12-13 09:52:33 +01:00
divtable.c Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
dynamic.c Add Intel Goldmont+ cpuid 2019-12-03 08:32:29 +01:00
dynamic_arm64.c Add Neoverse-N1 core 2020-02-29 03:22:04 +00:00
dynamic_power.c Fix DYNAMIC_ARCH build for POWER9 2020-03-03 12:35:10 -06:00
dynamic_zarch.c Merge pull request #2407 from susilehtola/patch-2 2020-02-11 13:04:44 +01:00
init.c Fix errors in cpu enumeration with glibc 2.6 2019-05-07 13:34:52 +02:00
lamc3.c Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
lamch.c Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
lsame.c Import GotoBLAS2 1.13 BSD version codes. 2011-01-24 14:54:24 +00:00
memory.c Add a read barrier in the traversing of the buffer list 2020-04-13 12:34:02 +02:00
memory_qalloc.c Disable the old QCDOC qalloc by default and copy utility functions from memory.c 2019-11-17 19:22:04 +01:00
openblas_env.c Refs #716. Only call getenv at init function. 2016-03-09 12:50:07 -05:00
openblas_error_handle.c Refs #716. Only call getenv at init function. 2016-03-09 12:50:07 -05:00
openblas_get_config.c address minor warnings from gcc7 2019-09-07 10:21:08 +03:00
openblas_get_num_procs.c Introduce openblas_get_num_threads and openblas_get_num_procs 2015-02-03 12:23:41 -05:00
openblas_get_num_threads.c Introduce openblas_get_num_threads and openblas_get_num_procs 2015-02-03 12:23:41 -05:00
openblas_get_parallel.c Update organization info. 2014-11-25 15:28:58 +08:00
openblas_set_num_threads.c Update organization info. 2014-11-25 15:28:58 +08:00
parameter.c RFC : Add half precision gemm for bfloat16 in OpenBLAS 2020-04-14 14:55:08 -05:00
profile.c Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
xerbla.c Update xerbla.c 2017-01-04 23:16:48 +01:00