Commit Graph

382 Commits

Author SHA1 Message Date
Martin Kroeker 04cdf5efb4
fix typo and missing declaration 2023-07-14 00:05:00 +02:00
Martin Kroeker 5e1103b8d7
Update rotg.c 2023-07-13 23:35:38 +02:00
Martin Kroeker 7c75c8b2fe
fix truncated edit 2023-07-13 21:40:12 +02:00
Martin Kroeker 0f2ce93904
typo fix 2023-07-13 10:56:59 +02:00
Martin Kroeker e08743d977
Update to use safe scaling algorithm from Reference-LAPACK PR 527 2023-07-12 23:02:36 +02:00
Martin Kroeker 7e93ab1b9e
Fix info code returned for invalid ldb 2023-07-09 17:00:25 +02:00
Martin Kroeker bb862b82d5
Fix integer overflow in multithreading threshold calculation for SYMM/SYRK (#4116)
* Fix potential integer overflow
2023-06-29 23:59:25 +02:00
Martin Kroeker c3a2d407a0
Merge pull request #4048 from imzhuhl/spr_sbgemm_fix
Sapphire Rapids sbgemm fix
2023-06-17 20:47:09 +02:00
Angelika Schwarz 899c3a6f6a Improve input argument checks of gemmt
* Fix return value for invalid info
* Add missing checks for ldA, ldB
* Use reference-LAPACK like checks (ie ld=0,nrows=0 is invalid)
2023-05-26 08:51:27 +02:00
Honglin Zhu 71e4125795 Fix syscall error on non-x86 platform 2023-05-22 21:59:59 +08:00
Honglin Zhu 90f041e348 Invoke the syscall to allow the use of amx tiles 2023-05-19 10:48:18 +08:00
Ken Ho df1b1f6a91 More detailed error message in [z]imatcopy.c. 2023-05-12 09:41:52 -07:00
Ken Ho 7a86c437b5 Change some "if" statements to "else if" following suggestion by @mmuetzel. 2023-05-10 09:13:04 -07:00
Ken Ho 33ab415f68 Bug fix and improvements for [z]imatcopy interface. 2023-05-08 14:43:56 -07:00
Martin Kroeker 1f6f7328eb
remove redundant declaration 2023-04-27 09:14:12 +02:00
Martin Kroeker 7152d6b06d
fix cblas_gemmt 2023-04-27 08:36:20 +02:00
Martin Kroeker 38d7a7b562
Fix ?GEMMT 2023-04-16 00:07:58 +02:00
Martin Kroeker 912d713b52
redo lost edit 2023-03-28 18:31:04 +02:00
Martin Kroeker dc15c18efc
Fix build failures seen with the NO_LAPACK option - cspr/csymv/csyr belong on the LAPACK list 2023-03-28 16:33:09 +02:00
H. Vetinari f2659516ef remove unqualified ifdef's for NO_LAPACK(E) 2023-03-28 19:01:31 +11:00
Martin Kroeker f2d6b1c70e
Add multithreading threshold 2023-03-26 00:25:28 +01:00
Martin Kroeker a495ffc554
Rework multithreading threshold 2023-03-26 00:23:57 +01:00
Martin Kroeker 244147495a
Do not use multithreading for small workloads 2023-03-23 23:13:02 +01:00
Martin Kroeker ab32f832a8
fix stray blank on continuation line 2023-03-21 08:29:05 +01:00
Martin Kroeker e359787e28
restore C/Z SPMV, SPR, SYR,SYMV 2023-03-21 07:43:03 +01:00
Martin Kroeker f10c266b4d
Fix stride in shortcut path for small N 2022-12-08 21:02:01 +01:00
Martin Kroeker 8c99d5d1b6
Merge pull request #3796 from martin-frbg/gemmt
Add a trivial GEMMT implementation based on a looped GEMV
2022-11-12 19:06:05 +01:00
Martin Kroeker e6204d254f
Update CMakeLists.txt 2022-11-08 16:21:11 +01:00
Martin Kroeker 1b77764182
Conditionally leave out bits of LAPACK to be overridden by ReLAPACK 2022-11-08 12:02:59 +01:00
Martin Kroeker c970717157
fix missing t in xgemmt rule
Co-authored-by: Alexis <35051714+amontoison@users.noreply.github.com>
2022-11-01 13:51:20 +01:00
Martin Kroeker e7fd8d21a6
Add GEMMT based on looped GEMV 2022-10-26 15:33:58 +02:00
Martin Kroeker a3e02742f2
Add USE_PERL fallback option for create script used with FUNCTION_PROFILE 2022-05-22 18:32:19 +02:00
Martin Kroeker f1c570a5f1
Add back original PERL-based script under new name 2022-05-22 18:29:01 +02:00
Owen Rafferty 42c7a27e6b
rewrite perl scripts in universal shell 2022-05-18 19:00:15 -05:00
Martin Kroeker 7656aba00e
Merge pull request #3493 from martin-frbg/casts+cleanup
WIP casts and cleanups
2022-02-06 23:55:06 +01:00
Martin Kroeker d2b5fbf80f
Exclude some complex (LAPACK) functions when NO_LAPACK is set 2022-01-27 22:02:08 +01:00
Martin Kroeker 64365c919e
fix function typecasts 2021-12-21 18:47:35 +01:00
gxw 25f99fa9f8 Add cblas_{c/z}srot cblas_{c/z}rotg support 2021-11-01 20:19:13 +08:00
Martin Kroeker 4b3769823a
Revert #3252 2021-10-24 23:57:06 +02:00
Martin Kroeker 2845f54eb8
Remove dangerous optimization from previous #3252 - buffer is never unused here 2021-10-20 10:50:02 +02:00
Martin Kroeker c35739db5e
Add separate entries for BFLOAT16 functions and fix missing cblas_xerbla 2021-09-14 16:15:57 +02:00
Martin Kroeker 1085775bc6
really remove the unused variable 2021-09-11 15:05:55 +02:00
Martin Kroeker 20581bf303
Remove unused variable 2021-09-11 14:36:27 +02:00
Wangyang Guo 4289cf048d sbgemm: avoid falling into SGEMM_KERNEL_DIRECT 2021-09-07 21:30:46 +08:00
Wangyang Guo 2e44ca0136 sbgemm: add missing cblas_sbgemm definition 2021-08-30 17:40:30 +08:00
Wangyang Guo 1d83ca4bca Small Matrix: support BFLOAT16 data type 2021-08-30 17:40:20 +08:00
Wangyang Guo c17d6dacb2 Small Matrix: skip compile in unimplemented data type 2021-08-05 05:46:13 +00:00
Wangyang Guo aa50185647 Small Matrix: better handle with GEMM3M marco 2021-08-05 02:45:53 +00:00
Wangyang Guo 478d1086c1 Small Matrix: support DYNAMIC_ARCH build 2021-08-04 03:12:41 +00:00
Wangyang Guo 5dc7c3c8e5 Small Matrix: add GEMM_SMALL_MATRIX_PERMIT to tune small matrics case 2021-08-02 07:06:54 +00:00
Xianyi Zhang 6022e5629c Refs #2587 fix small matrix c/zgemm bug. 2021-08-02 07:06:54 +00:00
Xianyi Zhang 57ed58cefe Refs #2587 Add small matrix optimization reference kernel for c/zgemm. 2021-08-02 07:06:54 +00:00
Xianyi Zhang 17d32a4a82 Change a1b0 gemm to b0 gemm. 2021-08-02 07:06:54 +00:00
Xianyi Zhang 4271cfcc6f Fix gemm interface bug for small matrix. 2021-08-02 07:06:51 +00:00
Xianyi Zhang be3349405d Add alpha=1.0 beta=0.0 for small gemm. 2021-08-02 07:01:47 +00:00
Xianyi Zhang 0a2077901c Add small marix optimization kernel interface.
make SMALL_MATRIX_OPT=1
2021-08-02 07:01:47 +00:00
Martin Kroeker 1dea57ab25
Revert PR #3250 (shortcut without buffer allocation) as it is unsafe on some x86_64 2021-07-14 20:32:57 +02:00
Martin Kroeker 7bb59fceb7
Clean up some warnings 2021-07-11 16:00:29 +02:00
Martin Kroeker 4ed99c2ce3
Merge pull request #3292 from martin-frbg/syrk_limit
Add lower limit for multithreading in xSYRK
2021-07-07 20:46:28 +02:00
Martin Kroeker 8186963d8c
Add lower limit for multithreading 2021-07-04 17:00:26 +02:00
Martin Kroeker 726c44242b
Add lower threshold for multithreading 2021-07-01 17:41:05 +02:00
Martin Kroeker 1b5620b66e
Add lower threshold for multithreading in ?potrf and ?potri 2021-06-26 23:47:41 +02:00
Martin Kroeker baf03a0937
Merge pull request #3252 from martin-frbg/more_shortcuts
Further shortcuts for (small) cases that do not need buffer allocation
2021-06-15 16:14:20 +02:00
Martin Kroeker 7aab5e826c
Merge pull request #3250 from martin-frbg/gemv-shortcut
Add shortcut for small-size S/D GEMV_N with increments of one
2021-06-15 14:50:14 +02:00
Martin Kroeker f84197c1a7
Add shortcuts for (small) cases that do not need expensive buffer allocation 2021-05-29 22:28:00 +02:00
Martin Kroeker 734bd265a8
revert symv changes for now 2021-05-29 15:40:03 +02:00
Martin Kroeker 1217eb910d
Fix copy-paste errors in variables used 2021-05-28 09:38:48 +02:00
Martin Kroeker d6d7a6685d
Add shortcuts for (small) cases that do not need expensive buffer allocation 2021-05-27 22:39:18 +02:00
Martin Kroeker f0e7345fb8
Add shortcut for small-size gemv_n with increments of one 2021-05-26 22:02:34 +02:00
Martin Kroeker 03297ff9f0
Add fast path for small xSYR with INCX==1 2021-05-22 20:41:18 +02:00
Gordon Fossum 8b599836db Add error message token for SBGEMM in gemm.c 2021-05-04 13:55:02 -05:00
Martin Kroeker 904b221f03
Add cast to prevent overflow of intermediate result 2021-05-01 14:47:22 +02:00
Martin Kroeker c5fb91f1bc
Fix division by zero in the non-x86 codepath 2021-04-29 09:47:18 +02:00
Harmen Stoppels ec6b354c32 use /usr/bin/env perl 2021-02-24 14:07:20 +01:00
Martin Kroeker bd906e3410
fix copy-paste error in build rules for cblas_crotg and cblas_zrotg 2021-01-30 16:46:25 +01:00
Alex Henrie f1bf2603e6 Remove dead assignment to dflag in rotmg functions 2021-01-14 19:40:32 -07:00
Alex Henrie 6f32991eae Don't define the mode variable when not needed in gemm functions 2021-01-14 19:40:31 -07:00
Martin Kroeker a8f249458d
Build CBLAS interfaces for CROTG and ZROTG as well 2021-01-13 00:29:38 +01:00
Martin Kroeker ac3e2a3fdd
Add CBLAS interfaces for csrot and zdrot 2021-01-12 23:22:00 +01:00
Martin Kroeker 857afcc41d
Use ifeq instead of ifdef for user-definable build options 2020-11-22 16:31:44 +01:00
Chen, Guobing a7b1f9b1bb Implementation of BF16 based gemv
1. Add a new API -- sbgemv to support bfloat16 based gemv
2. Implement a generic kernel for sbgemv
3. Implement an avx512-bf16 based kernel for sbgemv

Signed-off-by: Chen, Guobing <guobing.chen@intel.com>
2020-10-29 02:08:23 +08:00
Martin Kroeker 6a1f3e40af
Remove debug printout of object list 2020-10-26 21:37:04 +01:00
Rajalakshmi Srinivasaraghavan b5d30b390d Fix build issues with bfloat16
This patch fixes compilation errors due to recent renaming from SH to SB
with BUILD_BFLOAT16.
2020-10-13 11:00:22 -05:00
Martin Kroeker 1e7eb7b7a9
Fix typos in currently unused sections 2020-10-13 09:17:15 +02:00
Martin Kroeker 052f31bc3c
Change "HALF" and "sh" to "BFLOAT16" and "sb" 2020-10-12 00:02:16 +02:00
Martin Kroeker 0f7d73ff6d
Allow supporting only a subset of variable types 2020-10-11 14:53:26 +02:00
Martin Kroeker b475b4bd0d
Support building only a subset of types 2020-09-22 23:25:04 +02:00
Chen, Guobing deaeb6c5b8 Add bfloat16 based dot and conversion with single/double
1. Added bfloat16 based dot as new API: shdot
2. Implemented generic kernel and cooperlake-specific (AVX512-BF16) kernel for shdot
3. Added 4 conversion APIs for bfloat16 data type <=> single/double: shstobf16 shdtobf16 sbf16tos dbf16tod
     shstobf16 -- convert single float array to bfloat16 array
     shdtobf16 -- convert double float array to bfloat16 array
     sbf16tos  -- convert bfloat16 array to single float array
     dbf16tod  -- convert bfloat16 array to double float array
4. Implemented generic kernels for all 4 conversion APIs, and cooperlake-specific kernel for shstobf16 and shdtobf16
5. Update level1 thread facilitate functions and macros to support multi-threading for these new APIs
6. Fix Cooperlake platform detection/specify issue when under dynamic-arch building
7. Change the typedef of bfloat16 from unsigned short to more strict uint16_t

Signed-off-by: Chen, Guobing <guobing.chen@intel.com>
2020-09-04 02:31:25 +08:00
Martin Kroeker 75eeb265d7
[WIP] Refactor the driver code for direct SGEMM (#2782)
Move "direct SGEMM" functionality out of the SkylakeX SGEMM kernel and make it available
(on x86_64 targets only for now) in DYNAMIC_ARCH builds
* Add  sgemm_direct targets in the kernel Makefile.L3 and CMakeLists.txt
* Add direct_sgemm functions to the gotoblas struct in common_param.h
* Move sgemm_direct_performant helper to separate file
* Update gemm.c  to macros for sgemm_direct to support dynamic_arch naming via common_s,h
* (Conditionally) add sgemm_direct functions in setparam-ref.c
2020-08-19 14:51:09 +02:00
Martin Kroeker fee361ae64
fix another source of NO_CBLAS=0 surprise 2020-08-11 13:27:19 +02:00
Ashwin Sekhar T K 4e1be0e481 ARM64: Add THUNDERX3T110 Target 2020-07-26 23:32:24 -07:00
Martin Kroeker 5dd14e3d48
Make building the bfloat16 functions conditional on option BUILD_HALF (#2590)
* make building the bfloat16 BLAS functions conditional on BUILD_HALF

* pass the BUILD_HALF option to gensymbol

* Pass BUILD_HALF as a compiler define for dynamic_arch builds
2020-05-01 09:58:30 +02:00
Martin Kroeker 2db5178e2d
enable cblas interfaces to GEMM3M in CMAKE builds 2020-04-22 11:01:28 +02:00
Rajalakshmi Srinivasaraghavan 7eb55504b1 RFC : Add half precision gemm for bfloat16 in OpenBLAS
This patch adds support for bfloat16 data type matrix multiplication kernel.
For architectures that don't support bfloat16, it is defined as unsigned short
(2 bytes).  Default unroll sizes can be changed as per architecture as done for
SGEMM and for now 8 and 4 are used for M and N.  Size of ncopy/tcopy can be
changed as per architecture requirement and for now, size 2 is used.

Added shgemm in kernel/power/KERNEL.POWER9 and tested in powerpc64le and
powerpc64.  For reference, added a small test compare_sgemm_shgemm.c to compare
sgemm and shgemm output.

This patch does not cover OpenBLAS test, benchmark and lapack tests for shgemm.
Complex type implementation can be discussed and added once this is approved.
2020-04-14 14:55:08 -05:00
Martin Kroeker 8229c163b7
Use runtime check for AVX512 (sgemm_direct) capability when using DYNAMIC_ARCH 2020-03-26 21:12:56 +01:00
Martin Kroeker 6a14b34c20
Avoid calling DIRECT codepath in DYNAMIC_ARCH on non-SKX 2020-03-22 14:33:16 +01:00
Martin Kroeker d65e9a2bbd
Merge pull request #2253 from thrasibule/xerbla
fix error messages
2020-01-18 20:39:04 +01:00
Guillaume Horel 2463938879 fix error message 2019-09-11 10:35:25 -04:00
Guillaume Horel 5d6525c87c more bugfix 2019-09-10 17:30:57 -04:00
Guillaume Horel 459bb9291d fix error codes 2019-09-10 17:10:33 -04:00