OpenBLAS/interface
Rajalakshmi Srinivasaraghavan 7eb55504b1 RFC : Add half precision gemm for bfloat16 in OpenBLAS
This patch adds support for bfloat16 data type matrix multiplication kernel.
For architectures that don't support bfloat16, it is defined as unsigned short
(2 bytes).  Default unroll sizes can be changed as per architecture as done for
SGEMM and for now 8 and 4 are used for M and N.  Size of ncopy/tcopy can be
changed as per architecture requirement and for now, size 2 is used.

Added shgemm in kernel/power/KERNEL.POWER9 and tested in powerpc64le and
powerpc64.  For reference, added a small test compare_sgemm_shgemm.c to compare
sgemm and shgemm output.

This patch does not cover OpenBLAS test, benchmark and lapack tests for shgemm.
Complex type implementation can be discussed and added once this is approved.
2020-04-14 14:55:08 -05:00
..
lapack Merge pull request #2253 from thrasibule/xerbla 2020-01-18 20:39:04 +01:00
netlib ref #80. On P4 CPU with 32-bit Windows XP, Octave crashed with OpenBLAS. Walkaroud: Use netlib reference gemv instead of own funtions. 2012-03-16 20:29:39 +08:00
CMakeLists.txt Misc. typo fixes 2019-04-29 17:03:56 -04:00
Makefile RFC : Add half precision gemm for bfloat16 in OpenBLAS 2020-04-14 14:55:08 -05:00
asum.c Modify complex CBLAS functions to take void pointers 2017-11-05 15:53:14 +01:00
axpby.c Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
axpy.c Misc. typo fixes 2019-04-29 17:03:56 -04:00
copy.c Modify complex CBLAS functions to take void pointers 2017-11-05 15:53:14 +01:00
create Print the wall time (cycles) with enabling FUNCTION_PROFILE. 2011-06-09 10:40:15 +08:00
dot.c updated some level1 funcions, that are not thread save 2017-01-10 14:05:07 +01:00
dsdot.c Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
gbmv.c Use blasabs to switch between abs and labs as needed for INTERFACE64 2018-08-04 20:06:49 +02:00
geadd.c Add ATLAS-style ?geadd function 2015-02-16 13:46:20 +01:00
gemm.c RFC : Add half precision gemm for bfloat16 in OpenBLAS 2020-04-14 14:55:08 -05:00
gemv.c Use blasabs to switch between abs and labs as needed for INTERFACE64 2018-08-04 20:06:49 +02:00
ger.c [z]ger: increase multithread threshold 2016-01-24 10:46:35 +01:00
imatcopy.c Fix cols/rows mixup in omatcopy 2nd step for BlasTrans cases 2017-09-14 19:59:05 +02:00
imax.c Modify complex CBLAS functions to take void pointers 2017-11-05 15:53:14 +01:00
max.c Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
nrm2.c Modify complex CBLAS functions to take void pointers 2017-11-05 15:53:14 +01:00
omatcopy.c Add cblas_(s/d/c/z)omatcopy in order to have cblas interface for them. 2014-09-08 17:57:44 +02:00
rot.c updated some level1 funcions, that are not thread save 2017-01-10 14:05:07 +01:00
rotg.c fabs -> fabsl 2018-08-03 13:00:10 -04:00
rotm.c Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
rotmg.c Rewrite ROTMG to address cases not covered by the netlib algorithm (#1480) 2018-03-04 17:39:56 +01:00
sbmv.c Use blasabs to switch between abs and labs as needed for INTERFACE64 2018-08-04 20:06:49 +02:00
scal.c Fixed a few more unnecessary calls to num_cpu_avail. 2018-06-11 10:17:16 +01:00
sdsdot.c Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
spmv.c Use blasabs to switch between abs and labs as needed for INTERFACE64 2018-08-04 20:06:49 +02:00
spr.c Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
spr2.c Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
sum.c Add interface for ?sum (derived from ?asum) 2019-03-30 21:59:18 +01:00
swap.c ARM64: Use THUNDERX2T99 Neon Kernels for ARMV8 2018-10-17 10:44:37 -07:00
symm.c Modify complex CBLAS functions to take void pointers 2017-11-05 15:53:14 +01:00
symv.c Use blasabs to switch between abs and labs as needed for INTERFACE64 2018-08-04 20:06:49 +02:00
syr.c Minor C code fixes in interface/ 2015-11-09 14:15:49 +05:30
syr2.c Minor C code fixes in interface/ 2015-11-09 14:15:49 +05:30
syr2k.c Modify complex CBLAS functions to take void pointers 2017-11-05 15:53:14 +01:00
syrk.c Merge pull request #1359 from brada4/develop 2017-11-18 23:47:17 +01:00
tbmv.c Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
tbsv.c Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
tpmv.c Modify complex CBLAS functions to take void pointers 2017-11-05 15:53:14 +01:00
tpsv.c Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
trmv.c Allow multithreading TRMV again 2019-02-19 21:03:30 +01:00
trsm.c Correct length of name string in xerbla call 2019-04-27 22:49:04 +02:00
trsv.c Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
xerbla.c Update xerbla.c 2017-04-26 20:29:30 +02:00
zaxpby.c Modify complex CBLAS functions to take void pointers 2017-11-05 15:53:14 +01:00
zaxpy.c Misc. typo fixes 2019-04-29 17:03:56 -04:00
zdot.c Make return parameter of cblas_Xdotc_sub, cblas_Xdotu_sub a void pointer as well 2017-11-18 20:28:02 +01:00
zgbmv.c Use blasabs to switch between abs and labs as needed for INTERFACE64 2018-08-04 20:06:49 +02:00
zgeadd.c Add ATLAS-style ?geadd function 2015-02-16 13:46:20 +01:00
zgemv.c Use blasabs to switch between abs and labs as needed for INTERFACE64 2018-08-04 20:06:49 +02:00
zger.c Modify complex CBLAS functions to take void pointers 2017-11-05 15:53:14 +01:00
zhbmv.c Use blasabs to switch between abs and labs as needed for INTERFACE64 2018-08-04 20:06:49 +02:00
zhemv.c re-arrange new code for readability 2018-10-20 21:37:53 +03:00
zher.c Modify complex CBLAS functions to take void pointers 2017-11-05 15:53:14 +01:00
zher2.c Modify complex CBLAS functions to take void pointers 2017-11-05 15:53:14 +01:00
zhpmv.c Use blasabs to switch between abs and labs as needed for INTERFACE64 2018-08-04 20:06:49 +02:00
zhpr.c Modify complex CBLAS functions to take void pointers 2017-11-05 15:53:14 +01:00
zhpr2.c Modify complex CBLAS functions to take void pointers 2017-11-05 15:53:14 +01:00
zimatcopy.c Use in-place transform shortcut only if matrix is square 2017-07-21 11:20:15 +02:00
zomatcopy.c Add cblas_(s/d/c/z)omatcopy in order to have cblas interface for them. 2014-09-08 17:57:44 +02:00
zrot.c Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
zrotg.c fabs -> fabsl 2018-08-04 20:14:51 +02:00
zsbmv.c Use blasabs to switch between abs and labs as needed for INTERFACE64 2018-08-04 20:06:49 +02:00
zscal.c Fixed a few more unnecessary calls to num_cpu_avail. 2018-06-11 10:17:16 +01:00
zspmv.c Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
zspr.c Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
zspr2.c Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
zswap.c disable threading in C/ZSWAP copying from S/DSWAP 2018-10-22 23:21:49 +03:00
zsymv.c Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
zsyr.c Fixed cmake bug on Visual Studio. 2015-10-20 14:37:22 -05:00
zsyr2.c Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
ztbmv.c Modify complex CBLAS functions to take void pointers 2017-11-05 15:53:14 +01:00
ztbsv.c Modify complex CBLAS functions to take void pointers 2017-11-05 15:53:14 +01:00
ztpmv.c Modify complex CBLAS functions to take void pointers 2017-11-05 15:53:14 +01:00
ztpsv.c Modify complex CBLAS functions to take void pointers 2017-11-05 15:53:14 +01:00
ztrmv.c Allow multithreading TRMV again 2019-02-19 21:03:30 +01:00
ztrsv.c Modify complex CBLAS functions to take void pointers 2017-11-05 15:53:14 +01:00