Martin Kroeker
5dd14e3d48
Make building the bfloat16 functions conditional on option BUILD_HALF ( #2590 )
...
* make building the bfloat16 BLAS functions conditional on BUILD_HALF
* pass the BUILD_HALF option to gensymbol
* Pass BUILD_HALF as a compiler define for dynamic_arch builds
2020-05-01 09:58:30 +02:00
Rajalakshmi Srinivasaraghavan
7eb55504b1
RFC : Add half precision gemm for bfloat16 in OpenBLAS
...
This patch adds support for bfloat16 data type matrix multiplication kernel.
For architectures that don't support bfloat16, it is defined as unsigned short
(2 bytes). Default unroll sizes can be changed as per architecture as done for
SGEMM and for now 8 and 4 are used for M and N. Size of ncopy/tcopy can be
changed as per architecture requirement and for now, size 2 is used.
Added shgemm in kernel/power/KERNEL.POWER9 and tested in powerpc64le and
powerpc64. For reference, added a small test compare_sgemm_shgemm.c to compare
sgemm and shgemm output.
This patch does not cover OpenBLAS test, benchmark and lapack tests for shgemm.
Complex type implementation can be discussed and added once this is approved.
2020-04-14 14:55:08 -05:00
Guillaume Horel
af9ac0898a
fix Makefile
2019-09-08 11:14:49 -04:00
Guillaume Horel
9b2f0323d6
update Makefile
2019-09-08 11:14:49 -04:00
Martin Kroeker
79cfc24a62
Add interface for ?sum (derived from ?asum)
2019-03-30 21:59:18 +01:00
Martin Kroeker
3d1e36d4cb
Build CBLAS interfaces for I?MIN and I?MAX
2019-03-30 12:38:41 +01:00
Martin Kroeker
9cf22b7d91
Build cblas_iXamin interfaces
2018-06-23 13:27:30 +02:00
Martin Kroeker
7f546f54fa
Add cblas_xerbla
2017-04-26 20:01:34 +02:00
Werner Saar
ae4ac6f984
removed obj-files, that are moved to lapack 3.7.0
2017-01-06 16:14:53 +01:00
Martin Koehler
39cc6b21d3
Add ATLAS-style ?geadd function
2015-02-16 13:46:20 +01:00
wernsaar
9e829ce98f
enabled cblas gemm3m functions
2014-09-20 17:20:02 +02:00
wernsaar
d49fd33885
disabled SYMM3M and HEMM3M functions because segment violations
2014-09-20 15:27:40 +02:00
wernsaar
7aae4a62e7
enabled use of GEMM3M functions
2014-09-20 14:27:10 +02:00
Martin Koehler
a057e5434d
add CBLAS interface for s/d/c/zimatcopy
2014-09-09 09:52:13 +02:00
Martin Köhler
7794766d3c
Add cblas_(s/d/c/z)omatcopy in order to have cblas interface for them.
2014-09-08 17:57:44 +02:00
wernsaar
cedc1f4b14
Ref #410 : disabled optimized potri functions ( single threading bug)
2014-07-10 13:42:32 +02:00
wernsaar
be94db096c
disabled *3M functions for x86_64 platforms
2014-07-01 16:18:05 +02:00
Timothy Gu
6c2ead30f0
Remove all trailing whitespace except lapack-netlib
...
Signed-off-by: Timothy Gu <timothygu99@gmail.com>
2014-06-27 12:05:18 -07:00
wernsaar
faeab93df0
Ref #51 : added blas extensions simatcopy, dimatcopy, cimatcopy, zimatcopy
2014-06-10 16:14:34 +02:00
wernsaar
cee257f384
Ref #51 : added blas extensions zomatcopy and comatcopy
2014-06-10 10:34:54 +02:00
wernsaar
7bfb3011e8
Ref #51 : added blas extension somatcopy
2014-06-09 20:21:13 +02:00
wernsaar
8c8f596238
Ref #51 : added blas extension domatcopy as not opimized reference
2014-06-09 17:11:07 +02:00
wernsaar
faf3ac0aad
Ref #285 : added axpby kernels
2014-06-08 11:54:24 +02:00
wernsaar
89da450800
enabled and tested optimized potri lapack functions
2014-05-23 12:14:30 +02:00
wernsaar
c26bbee489
enabled abd tested optimized trtri lapack functions
2014-05-23 10:55:39 +02:00
wernsaar
a748d3a75d
enabled optimized trti2 lapack functions again
2014-05-21 11:02:07 +02:00
wernsaar
a5ab231ad4
enabled optimized complex lauum lapack functions again
2014-05-21 10:35:28 +02:00
wernsaar
dbaeea7b59
enabled lauu2 and lauum lapack functions again
2014-05-21 09:49:18 +02:00
wernsaar
0d75f3b6a2
enabled and tested optimized gesv lapack functions
2014-05-19 14:44:53 +02:00
wernsaar
2ff66e661d
enabled and tested optimized laswp lapack function
2014-05-19 13:35:32 +02:00
wernsaar
ebc95e6f11
enabled and tested optimized potf2 lapack functions
2014-05-18 22:41:43 +02:00
wernsaar
61a2c50e8e
enabled and tested optimized getf2 lapack functions
2014-05-18 22:21:16 +02:00
wernsaar
4f98f8c9b3
enabled and tested optimized potrf lapack functions
2014-05-18 21:42:37 +02:00
wernsaar
536875d463
enabled and tested optimized getrs lapack functions
2014-05-18 21:13:56 +02:00
wernsaar
65f2fba4c3
enabled and tested optimized cgetrf lapack function
2014-05-18 20:32:27 +02:00
wernsaar
eea6f51df9
enabled and tested optimized sgetrf lapack function
2014-05-18 20:01:23 +02:00
wernsaar
6fc4646709
enabled and tested optimized zgetrf lapack function
2014-05-18 19:36:32 +02:00
wernsaar
ac029f81b3
enabled and tested optimized dgetrf function
2014-05-18 19:07:51 +02:00
wernsaar
189ca1bcee
removed lapack objects from interface/Makefile
2014-05-11 12:09:34 +02:00
Zhang Xianyi
c92ae012a6
Refs #279 . Provide ONLY_CBLAS flag. If you only need CBLAS without
...
a fortran compiler, please try make ONLY_CBLAS=1.
This mode only compiler CBLAS without BLAS fortran interface and LAPACK.
2013-08-21 00:03:25 +08:00
Jameson Nash
d0e731e8b8
provide support for passing CFLAGS, FFLAGS, PFLAGS, FPFLAGS to make on the command line
2012-08-21 00:31:12 -04:00
Xianyi Zhang
722dd08703
ref #80 . On P4 CPU with 32-bit Windows XP, Octave crashed with OpenBLAS. Walkaroud: Use netlib reference gemv instead of own funtions.
...
For example, make USE_NETLIB_GEMV=1
2012-03-16 20:29:39 +08:00
Xianyi Zhang
8f1090d32a
Support NO_LAPACK=1 to build the lib without LAPACK functions.
2011-03-04 11:51:32 +08:00
Xianyi Zhang
342bbc3871
Import GotoBLAS2 1.13 BSD version codes.
2011-01-24 14:54:24 +00:00