Commit Graph

15 Commits

Author SHA1 Message Date
Rajalakshmi Srinivasaraghavan
67cc4b9e16 Fix warnings in clang and export symbol 2020-04-15 19:15:23 -05:00
Rajalakshmi Srinivasaraghavan
7eb55504b1 RFC : Add half precision gemm for bfloat16 in OpenBLAS
This patch adds support for bfloat16 data type matrix multiplication kernel.
For architectures that don't support bfloat16, it is defined as unsigned short
(2 bytes).  Default unroll sizes can be changed as per architecture as done for
SGEMM and for now 8 and 4 are used for M and N.  Size of ncopy/tcopy can be
changed as per architecture requirement and for now, size 2 is used.

Added shgemm in kernel/power/KERNEL.POWER9 and tested in powerpc64le and
powerpc64.  For reference, added a small test compare_sgemm_shgemm.c to compare
sgemm and shgemm output.

This patch does not cover OpenBLAS test, benchmark and lapack tests for shgemm.
Complex type implementation can be discussed and added once this is approved.
2020-04-14 14:55:08 -05:00
Martin Kroeker
5c42287c4f Add declarations for ?sum and cblas_?sum 2019-03-30 21:58:03 +01:00
Martin Koehler
39cc6b21d3 Add ATLAS-style ?geadd function 2015-02-16 13:46:20 +01:00
wernsaar
0a22816e70 Ref #433: removed obsolete lapack entries from common_interface.h 2014-08-15 12:40:10 +02:00
Timothy Gu
6c2ead30f0 Remove all trailing whitespace except lapack-netlib
Signed-off-by: Timothy Gu <timothygu99@gmail.com>
2014-06-27 12:05:18 -07:00
wernsaar
faeab93df0 Ref #51: added blas extensions simatcopy, dimatcopy, cimatcopy, zimatcopy 2014-06-10 16:14:34 +02:00
wernsaar
8c8f596238 Ref #51: added blas extension domatcopy as not opimized reference 2014-06-09 17:11:07 +02:00
wernsaar
faf3ac0aad Ref #285: added axpby kernels 2014-06-08 11:54:24 +02:00
Zhang Xianyi
b1a54a0107 Fixed #141. make f77blas.h compatible with compilers which lack C99 complex number.
Apply the patch from Tony @tonyhill. Thank you.
2012-10-08 12:48:20 +08:00
Zhang Xianyi
9419a43a7f Fixed #142. Added the gesvd and potrs function families to common_interface.h. 2012-09-14 15:15:08 +08:00
Zhang Xianyi
d007cca61d Refs #134. Fixed the building bug on IBM Power. 2012-08-10 11:54:21 +08:00
Zhang Xianyi
422359d09a Export openblas_set_num_threads in shared library. 2012-06-23 11:32:43 +08:00
Xianyi Zhang
b9d89f8aaa Fixed the bug about installation. f77blas.h works OK now. 2011-08-31 18:21:37 +08:00
Xianyi Zhang
342bbc3871 Import GotoBLAS2 1.13 BSD version codes. 2011-01-24 14:54:24 +00:00