Commit Graph

283 Commits

Author SHA1 Message Date
Wangyang Guo
5dc7c3c8e5 Small Matrix: add GEMM_SMALL_MATRIX_PERMIT to tune small matrics case 2021-08-02 07:06:54 +00:00
Xianyi Zhang
6022e5629c Refs #2587 fix small matrix c/zgemm bug. 2021-08-02 07:06:54 +00:00
Xianyi Zhang
57ed58cefe Refs #2587 Add small matrix optimization reference kernel for c/zgemm. 2021-08-02 07:06:54 +00:00
Xianyi Zhang
17d32a4a82 Change a1b0 gemm to b0 gemm. 2021-08-02 07:06:54 +00:00
Xianyi Zhang
4271cfcc6f Fix gemm interface bug for small matrix. 2021-08-02 07:06:51 +00:00
Xianyi Zhang
be3349405d Add alpha=1.0 beta=0.0 for small gemm. 2021-08-02 07:01:47 +00:00
Xianyi Zhang
0a2077901c Add small marix optimization kernel interface.
make SMALL_MATRIX_OPT=1
2021-08-02 07:01:47 +00:00
Martin Kroeker
1dea57ab25 Revert PR #3250 (shortcut without buffer allocation) as it is unsafe on some x86_64 2021-07-14 20:32:57 +02:00
Martin Kroeker
7bb59fceb7 Clean up some warnings 2021-07-11 16:00:29 +02:00
Martin Kroeker
4ed99c2ce3 Merge pull request #3292 from martin-frbg/syrk_limit
Add lower limit for multithreading in xSYRK
2021-07-07 20:46:28 +02:00
Martin Kroeker
8186963d8c Add lower limit for multithreading 2021-07-04 17:00:26 +02:00
Martin Kroeker
726c44242b Add lower threshold for multithreading 2021-07-01 17:41:05 +02:00
Martin Kroeker
1b5620b66e Add lower threshold for multithreading in ?potrf and ?potri 2021-06-26 23:47:41 +02:00
Martin Kroeker
baf03a0937 Merge pull request #3252 from martin-frbg/more_shortcuts
Further shortcuts for (small) cases that do not need buffer allocation
2021-06-15 16:14:20 +02:00
Martin Kroeker
7aab5e826c Merge pull request #3250 from martin-frbg/gemv-shortcut
Add shortcut for small-size S/D GEMV_N with increments of one
2021-06-15 14:50:14 +02:00
Martin Kroeker
f84197c1a7 Add shortcuts for (small) cases that do not need expensive buffer allocation 2021-05-29 22:28:00 +02:00
Martin Kroeker
734bd265a8 revert symv changes for now 2021-05-29 15:40:03 +02:00
Martin Kroeker
1217eb910d Fix copy-paste errors in variables used 2021-05-28 09:38:48 +02:00
Martin Kroeker
d6d7a6685d Add shortcuts for (small) cases that do not need expensive buffer allocation 2021-05-27 22:39:18 +02:00
Martin Kroeker
f0e7345fb8 Add shortcut for small-size gemv_n with increments of one 2021-05-26 22:02:34 +02:00
Martin Kroeker
03297ff9f0 Add fast path for small xSYR with INCX==1 2021-05-22 20:41:18 +02:00
Gordon Fossum
8b599836db Add error message token for SBGEMM in gemm.c 2021-05-04 13:55:02 -05:00
Martin Kroeker
904b221f03 Add cast to prevent overflow of intermediate result 2021-05-01 14:47:22 +02:00
Martin Kroeker
c5fb91f1bc Fix division by zero in the non-x86 codepath 2021-04-29 09:47:18 +02:00
Harmen Stoppels
ec6b354c32 use /usr/bin/env perl 2021-02-24 14:07:20 +01:00
Martin Kroeker
bd906e3410 fix copy-paste error in build rules for cblas_crotg and cblas_zrotg 2021-01-30 16:46:25 +01:00
Alex Henrie
f1bf2603e6 Remove dead assignment to dflag in rotmg functions 2021-01-14 19:40:32 -07:00
Alex Henrie
6f32991eae Don't define the mode variable when not needed in gemm functions 2021-01-14 19:40:31 -07:00
Martin Kroeker
a8f249458d Build CBLAS interfaces for CROTG and ZROTG as well 2021-01-13 00:29:38 +01:00
Martin Kroeker
ac3e2a3fdd Add CBLAS interfaces for csrot and zdrot 2021-01-12 23:22:00 +01:00
Martin Kroeker
857afcc41d Use ifeq instead of ifdef for user-definable build options 2020-11-22 16:31:44 +01:00
Chen, Guobing
a7b1f9b1bb Implementation of BF16 based gemv
1. Add a new API -- sbgemv to support bfloat16 based gemv
2. Implement a generic kernel for sbgemv
3. Implement an avx512-bf16 based kernel for sbgemv

Signed-off-by: Chen, Guobing <guobing.chen@intel.com>
2020-10-29 02:08:23 +08:00
Martin Kroeker
6a1f3e40af Remove debug printout of object list 2020-10-26 21:37:04 +01:00
Rajalakshmi Srinivasaraghavan
b5d30b390d Fix build issues with bfloat16
This patch fixes compilation errors due to recent renaming from SH to SB
with BUILD_BFLOAT16.
2020-10-13 11:00:22 -05:00
Martin Kroeker
1e7eb7b7a9 Fix typos in currently unused sections 2020-10-13 09:17:15 +02:00
Martin Kroeker
052f31bc3c Change "HALF" and "sh" to "BFLOAT16" and "sb" 2020-10-12 00:02:16 +02:00
Martin Kroeker
0f7d73ff6d Allow supporting only a subset of variable types 2020-10-11 14:53:26 +02:00
Martin Kroeker
b475b4bd0d Support building only a subset of types 2020-09-22 23:25:04 +02:00
Chen, Guobing
deaeb6c5b8 Add bfloat16 based dot and conversion with single/double
1. Added bfloat16 based dot as new API: shdot
2. Implemented generic kernel and cooperlake-specific (AVX512-BF16) kernel for shdot
3. Added 4 conversion APIs for bfloat16 data type <=> single/double: shstobf16 shdtobf16 sbf16tos dbf16tod
     shstobf16 -- convert single float array to bfloat16 array
     shdtobf16 -- convert double float array to bfloat16 array
     sbf16tos  -- convert bfloat16 array to single float array
     dbf16tod  -- convert bfloat16 array to double float array
4. Implemented generic kernels for all 4 conversion APIs, and cooperlake-specific kernel for shstobf16 and shdtobf16
5. Update level1 thread facilitate functions and macros to support multi-threading for these new APIs
6. Fix Cooperlake platform detection/specify issue when under dynamic-arch building
7. Change the typedef of bfloat16 from unsigned short to more strict uint16_t

Signed-off-by: Chen, Guobing <guobing.chen@intel.com>
2020-09-04 02:31:25 +08:00
Martin Kroeker
75eeb265d7 [WIP] Refactor the driver code for direct SGEMM (#2782)
Move "direct SGEMM" functionality out of the SkylakeX SGEMM kernel and make it available
(on x86_64 targets only for now) in DYNAMIC_ARCH builds
* Add  sgemm_direct targets in the kernel Makefile.L3 and CMakeLists.txt
* Add direct_sgemm functions to the gotoblas struct in common_param.h
* Move sgemm_direct_performant helper to separate file
* Update gemm.c  to macros for sgemm_direct to support dynamic_arch naming via common_s,h
* (Conditionally) add sgemm_direct functions in setparam-ref.c
2020-08-19 14:51:09 +02:00
Martin Kroeker
fee361ae64 fix another source of NO_CBLAS=0 surprise 2020-08-11 13:27:19 +02:00
Ashwin Sekhar T K
4e1be0e481 ARM64: Add THUNDERX3T110 Target 2020-07-26 23:32:24 -07:00
Martin Kroeker
5dd14e3d48 Make building the bfloat16 functions conditional on option BUILD_HALF (#2590)
* make building the bfloat16 BLAS functions conditional on BUILD_HALF

* pass the BUILD_HALF option to gensymbol

* Pass BUILD_HALF as a compiler define for dynamic_arch builds
2020-05-01 09:58:30 +02:00
Martin Kroeker
2db5178e2d enable cblas interfaces to GEMM3M in CMAKE builds 2020-04-22 11:01:28 +02:00
Rajalakshmi Srinivasaraghavan
7eb55504b1 RFC : Add half precision gemm for bfloat16 in OpenBLAS
This patch adds support for bfloat16 data type matrix multiplication kernel.
For architectures that don't support bfloat16, it is defined as unsigned short
(2 bytes).  Default unroll sizes can be changed as per architecture as done for
SGEMM and for now 8 and 4 are used for M and N.  Size of ncopy/tcopy can be
changed as per architecture requirement and for now, size 2 is used.

Added shgemm in kernel/power/KERNEL.POWER9 and tested in powerpc64le and
powerpc64.  For reference, added a small test compare_sgemm_shgemm.c to compare
sgemm and shgemm output.

This patch does not cover OpenBLAS test, benchmark and lapack tests for shgemm.
Complex type implementation can be discussed and added once this is approved.
2020-04-14 14:55:08 -05:00
Martin Kroeker
8229c163b7 Use runtime check for AVX512 (sgemm_direct) capability when using DYNAMIC_ARCH 2020-03-26 21:12:56 +01:00
Martin Kroeker
6a14b34c20 Avoid calling DIRECT codepath in DYNAMIC_ARCH on non-SKX 2020-03-22 14:33:16 +01:00
Martin Kroeker
d65e9a2bbd Merge pull request #2253 from thrasibule/xerbla
fix error messages
2020-01-18 20:39:04 +01:00
Guillaume Horel
2463938879 fix error message 2019-09-11 10:35:25 -04:00
Guillaume Horel
5d6525c87c more bugfix 2019-09-10 17:30:57 -04:00