Martin Kroeker
b926e70ebd
Fix typo in build rule of "profiled" sbgemm
2023-09-21 23:07:32 +02:00
Angelika Schwarz
db3a43c8ed
Simplify rotg
...
* The check da != ZERO is no longer necessary since there
is a special case ada == ZERO, where ada = |da|.
* Add the missing check c != ZERO before the division.
Note that with these two changes the long double code
follows the float/double version of the code.
2023-09-20 19:43:00 +02:00
Angelika Schwarz
6876ae0c3b
Fix division by zero in zrotg
...
The cases
[ c s ] * [ 0 ] = [ |db_i| ]
[-s c ] [ i*db_i ] [ 0 ]
and
[ c s ] * [ 0 ] = [ |db_r| ]
[-s c ] [ db_r ] [ 0 ]
computed s incorrectly. To flip the entries of vector,
s should be conjg(db)/|db| and not conjg(db) / da,
where da == 0.0.
2023-09-20 19:11:59 +02:00
Martin Kroeker
42909ce57d
Merge branch 'xianyi:develop' into issue4130
2023-09-01 09:05:58 +02:00
Martin Kroeker
a2a184572c
update zrotg
2023-08-31 23:42:12 +02:00
Martin Kroeker
214be14c1d
Correct INFO returned for lda in non-CBLAS s/dgeadd
2023-08-18 22:48:30 +02:00
Martin Kroeker
4cc804c754
Prepare for INCX < 0 in new NRM2 implementation from BLAS 3.10
2023-08-09 16:13:23 +02:00
Martin Kroeker
04cdf5efb4
fix typo and missing declaration
2023-07-14 00:05:00 +02:00
Martin Kroeker
5e1103b8d7
Update rotg.c
2023-07-13 23:35:38 +02:00
Martin Kroeker
7c75c8b2fe
fix truncated edit
2023-07-13 21:40:12 +02:00
Martin Kroeker
0f2ce93904
typo fix
2023-07-13 10:56:59 +02:00
Martin Kroeker
e08743d977
Update to use safe scaling algorithm from Reference-LAPACK PR 527
2023-07-12 23:02:36 +02:00
Martin Kroeker
7e93ab1b9e
Fix info code returned for invalid ldb
2023-07-09 17:00:25 +02:00
Martin Kroeker
bb862b82d5
Fix integer overflow in multithreading threshold calculation for SYMM/SYRK ( #4116 )
...
* Fix potential integer overflow
2023-06-29 23:59:25 +02:00
Martin Kroeker
c3a2d407a0
Merge pull request #4048 from imzhuhl/spr_sbgemm_fix
...
Sapphire Rapids sbgemm fix
2023-06-17 20:47:09 +02:00
Angelika Schwarz
899c3a6f6a
Improve input argument checks of gemmt
...
* Fix return value for invalid info
* Add missing checks for ldA, ldB
* Use reference-LAPACK like checks (ie ld=0,nrows=0 is invalid)
2023-05-26 08:51:27 +02:00
Honglin Zhu
71e4125795
Fix syscall error on non-x86 platform
2023-05-22 21:59:59 +08:00
Honglin Zhu
90f041e348
Invoke the syscall to allow the use of amx tiles
2023-05-19 10:48:18 +08:00
Ken Ho
df1b1f6a91
More detailed error message in [z]imatcopy.c.
2023-05-12 09:41:52 -07:00
Ken Ho
7a86c437b5
Change some "if" statements to "else if" following suggestion by @mmuetzel.
2023-05-10 09:13:04 -07:00
Ken Ho
33ab415f68
Bug fix and improvements for [z]imatcopy interface.
2023-05-08 14:43:56 -07:00
Martin Kroeker
1f6f7328eb
remove redundant declaration
2023-04-27 09:14:12 +02:00
Martin Kroeker
7152d6b06d
fix cblas_gemmt
2023-04-27 08:36:20 +02:00
Martin Kroeker
38d7a7b562
Fix ?GEMMT
2023-04-16 00:07:58 +02:00
Martin Kroeker
912d713b52
redo lost edit
2023-03-28 18:31:04 +02:00
Martin Kroeker
dc15c18efc
Fix build failures seen with the NO_LAPACK option - cspr/csymv/csyr belong on the LAPACK list
2023-03-28 16:33:09 +02:00
H. Vetinari
f2659516ef
remove unqualified ifdef's for NO_LAPACK(E)
2023-03-28 19:01:31 +11:00
Martin Kroeker
f2d6b1c70e
Add multithreading threshold
2023-03-26 00:25:28 +01:00
Martin Kroeker
a495ffc554
Rework multithreading threshold
2023-03-26 00:23:57 +01:00
Martin Kroeker
244147495a
Do not use multithreading for small workloads
2023-03-23 23:13:02 +01:00
Martin Kroeker
ab32f832a8
fix stray blank on continuation line
2023-03-21 08:29:05 +01:00
Martin Kroeker
e359787e28
restore C/Z SPMV, SPR, SYR,SYMV
2023-03-21 07:43:03 +01:00
Martin Kroeker
f10c266b4d
Fix stride in shortcut path for small N
2022-12-08 21:02:01 +01:00
Martin Kroeker
8c99d5d1b6
Merge pull request #3796 from martin-frbg/gemmt
...
Add a trivial GEMMT implementation based on a looped GEMV
2022-11-12 19:06:05 +01:00
Martin Kroeker
e6204d254f
Update CMakeLists.txt
2022-11-08 16:21:11 +01:00
Martin Kroeker
1b77764182
Conditionally leave out bits of LAPACK to be overridden by ReLAPACK
2022-11-08 12:02:59 +01:00
Martin Kroeker
c970717157
fix missing t in xgemmt rule
...
Co-authored-by: Alexis <35051714+amontoison@users.noreply.github.com>
2022-11-01 13:51:20 +01:00
Martin Kroeker
e7fd8d21a6
Add GEMMT based on looped GEMV
2022-10-26 15:33:58 +02:00
Martin Kroeker
a3e02742f2
Add USE_PERL fallback option for create script used with FUNCTION_PROFILE
2022-05-22 18:32:19 +02:00
Martin Kroeker
f1c570a5f1
Add back original PERL-based script under new name
2022-05-22 18:29:01 +02:00
Owen Rafferty
42c7a27e6b
rewrite perl scripts in universal shell
2022-05-18 19:00:15 -05:00
Martin Kroeker
7656aba00e
Merge pull request #3493 from martin-frbg/casts+cleanup
...
WIP casts and cleanups
2022-02-06 23:55:06 +01:00
Martin Kroeker
d2b5fbf80f
Exclude some complex (LAPACK) functions when NO_LAPACK is set
2022-01-27 22:02:08 +01:00
Martin Kroeker
64365c919e
fix function typecasts
2021-12-21 18:47:35 +01:00
gxw
25f99fa9f8
Add cblas_{c/z}srot cblas_{c/z}rotg support
2021-11-01 20:19:13 +08:00
Martin Kroeker
4b3769823a
Revert #3252
2021-10-24 23:57:06 +02:00
Martin Kroeker
2845f54eb8
Remove dangerous optimization from previous #3252 - buffer is never unused here
2021-10-20 10:50:02 +02:00
Martin Kroeker
c35739db5e
Add separate entries for BFLOAT16 functions and fix missing cblas_xerbla
2021-09-14 16:15:57 +02:00
Martin Kroeker
1085775bc6
really remove the unused variable
2021-09-11 15:05:55 +02:00
Martin Kroeker
20581bf303
Remove unused variable
2021-09-11 14:36:27 +02:00
Wangyang Guo
4289cf048d
sbgemm: avoid falling into SGEMM_KERNEL_DIRECT
2021-09-07 21:30:46 +08:00
Wangyang Guo
2e44ca0136
sbgemm: add missing cblas_sbgemm definition
2021-08-30 17:40:30 +08:00
Wangyang Guo
1d83ca4bca
Small Matrix: support BFLOAT16 data type
2021-08-30 17:40:20 +08:00
Wangyang Guo
c17d6dacb2
Small Matrix: skip compile in unimplemented data type
2021-08-05 05:46:13 +00:00
Wangyang Guo
aa50185647
Small Matrix: better handle with GEMM3M marco
2021-08-05 02:45:53 +00:00
Wangyang Guo
478d1086c1
Small Matrix: support DYNAMIC_ARCH build
2021-08-04 03:12:41 +00:00
Wangyang Guo
5dc7c3c8e5
Small Matrix: add GEMM_SMALL_MATRIX_PERMIT to tune small matrics case
2021-08-02 07:06:54 +00:00
Xianyi Zhang
6022e5629c
Refs #2587 fix small matrix c/zgemm bug.
2021-08-02 07:06:54 +00:00
Xianyi Zhang
57ed58cefe
Refs #2587 Add small matrix optimization reference kernel for c/zgemm.
2021-08-02 07:06:54 +00:00
Xianyi Zhang
17d32a4a82
Change a1b0 gemm to b0 gemm.
2021-08-02 07:06:54 +00:00
Xianyi Zhang
4271cfcc6f
Fix gemm interface bug for small matrix.
2021-08-02 07:06:51 +00:00
Xianyi Zhang
be3349405d
Add alpha=1.0 beta=0.0 for small gemm.
2021-08-02 07:01:47 +00:00
Xianyi Zhang
0a2077901c
Add small marix optimization kernel interface.
...
make SMALL_MATRIX_OPT=1
2021-08-02 07:01:47 +00:00
Martin Kroeker
1dea57ab25
Revert PR #3250 (shortcut without buffer allocation) as it is unsafe on some x86_64
2021-07-14 20:32:57 +02:00
Martin Kroeker
7bb59fceb7
Clean up some warnings
2021-07-11 16:00:29 +02:00
Martin Kroeker
4ed99c2ce3
Merge pull request #3292 from martin-frbg/syrk_limit
...
Add lower limit for multithreading in xSYRK
2021-07-07 20:46:28 +02:00
Martin Kroeker
8186963d8c
Add lower limit for multithreading
2021-07-04 17:00:26 +02:00
Martin Kroeker
726c44242b
Add lower threshold for multithreading
2021-07-01 17:41:05 +02:00
Martin Kroeker
1b5620b66e
Add lower threshold for multithreading in ?potrf and ?potri
2021-06-26 23:47:41 +02:00
Martin Kroeker
baf03a0937
Merge pull request #3252 from martin-frbg/more_shortcuts
...
Further shortcuts for (small) cases that do not need buffer allocation
2021-06-15 16:14:20 +02:00
Martin Kroeker
7aab5e826c
Merge pull request #3250 from martin-frbg/gemv-shortcut
...
Add shortcut for small-size S/D GEMV_N with increments of one
2021-06-15 14:50:14 +02:00
Martin Kroeker
f84197c1a7
Add shortcuts for (small) cases that do not need expensive buffer allocation
2021-05-29 22:28:00 +02:00
Martin Kroeker
734bd265a8
revert symv changes for now
2021-05-29 15:40:03 +02:00
Martin Kroeker
1217eb910d
Fix copy-paste errors in variables used
2021-05-28 09:38:48 +02:00
Martin Kroeker
d6d7a6685d
Add shortcuts for (small) cases that do not need expensive buffer allocation
2021-05-27 22:39:18 +02:00
Martin Kroeker
f0e7345fb8
Add shortcut for small-size gemv_n with increments of one
2021-05-26 22:02:34 +02:00
Martin Kroeker
03297ff9f0
Add fast path for small xSYR with INCX==1
2021-05-22 20:41:18 +02:00
Gordon Fossum
8b599836db
Add error message token for SBGEMM in gemm.c
2021-05-04 13:55:02 -05:00
Martin Kroeker
904b221f03
Add cast to prevent overflow of intermediate result
2021-05-01 14:47:22 +02:00
Martin Kroeker
c5fb91f1bc
Fix division by zero in the non-x86 codepath
2021-04-29 09:47:18 +02:00
Harmen Stoppels
ec6b354c32
use /usr/bin/env perl
2021-02-24 14:07:20 +01:00
Martin Kroeker
bd906e3410
fix copy-paste error in build rules for cblas_crotg and cblas_zrotg
2021-01-30 16:46:25 +01:00
Alex Henrie
f1bf2603e6
Remove dead assignment to dflag in rotmg functions
2021-01-14 19:40:32 -07:00
Alex Henrie
6f32991eae
Don't define the mode variable when not needed in gemm functions
2021-01-14 19:40:31 -07:00
Martin Kroeker
a8f249458d
Build CBLAS interfaces for CROTG and ZROTG as well
2021-01-13 00:29:38 +01:00
Martin Kroeker
ac3e2a3fdd
Add CBLAS interfaces for csrot and zdrot
2021-01-12 23:22:00 +01:00
Martin Kroeker
857afcc41d
Use ifeq instead of ifdef for user-definable build options
2020-11-22 16:31:44 +01:00
Chen, Guobing
a7b1f9b1bb
Implementation of BF16 based gemv
...
1. Add a new API -- sbgemv to support bfloat16 based gemv
2. Implement a generic kernel for sbgemv
3. Implement an avx512-bf16 based kernel for sbgemv
Signed-off-by: Chen, Guobing <guobing.chen@intel.com>
2020-10-29 02:08:23 +08:00
Martin Kroeker
6a1f3e40af
Remove debug printout of object list
2020-10-26 21:37:04 +01:00
Rajalakshmi Srinivasaraghavan
b5d30b390d
Fix build issues with bfloat16
...
This patch fixes compilation errors due to recent renaming from SH to SB
with BUILD_BFLOAT16.
2020-10-13 11:00:22 -05:00
Martin Kroeker
1e7eb7b7a9
Fix typos in currently unused sections
2020-10-13 09:17:15 +02:00
Martin Kroeker
052f31bc3c
Change "HALF" and "sh" to "BFLOAT16" and "sb"
2020-10-12 00:02:16 +02:00
Martin Kroeker
0f7d73ff6d
Allow supporting only a subset of variable types
2020-10-11 14:53:26 +02:00
Martin Kroeker
b475b4bd0d
Support building only a subset of types
2020-09-22 23:25:04 +02:00
Chen, Guobing
deaeb6c5b8
Add bfloat16 based dot and conversion with single/double
...
1. Added bfloat16 based dot as new API: shdot
2. Implemented generic kernel and cooperlake-specific (AVX512-BF16) kernel for shdot
3. Added 4 conversion APIs for bfloat16 data type <=> single/double: shstobf16 shdtobf16 sbf16tos dbf16tod
shstobf16 -- convert single float array to bfloat16 array
shdtobf16 -- convert double float array to bfloat16 array
sbf16tos -- convert bfloat16 array to single float array
dbf16tod -- convert bfloat16 array to double float array
4. Implemented generic kernels for all 4 conversion APIs, and cooperlake-specific kernel for shstobf16 and shdtobf16
5. Update level1 thread facilitate functions and macros to support multi-threading for these new APIs
6. Fix Cooperlake platform detection/specify issue when under dynamic-arch building
7. Change the typedef of bfloat16 from unsigned short to more strict uint16_t
Signed-off-by: Chen, Guobing <guobing.chen@intel.com>
2020-09-04 02:31:25 +08:00
Martin Kroeker
75eeb265d7
[WIP] Refactor the driver code for direct SGEMM ( #2782 )
...
Move "direct SGEMM" functionality out of the SkylakeX SGEMM kernel and make it available
(on x86_64 targets only for now) in DYNAMIC_ARCH builds
* Add sgemm_direct targets in the kernel Makefile.L3 and CMakeLists.txt
* Add direct_sgemm functions to the gotoblas struct in common_param.h
* Move sgemm_direct_performant helper to separate file
* Update gemm.c to macros for sgemm_direct to support dynamic_arch naming via common_s,h
* (Conditionally) add sgemm_direct functions in setparam-ref.c
2020-08-19 14:51:09 +02:00
Martin Kroeker
fee361ae64
fix another source of NO_CBLAS=0 surprise
2020-08-11 13:27:19 +02:00
Ashwin Sekhar T K
4e1be0e481
ARM64: Add THUNDERX3T110 Target
2020-07-26 23:32:24 -07:00
Martin Kroeker
5dd14e3d48
Make building the bfloat16 functions conditional on option BUILD_HALF ( #2590 )
...
* make building the bfloat16 BLAS functions conditional on BUILD_HALF
* pass the BUILD_HALF option to gensymbol
* Pass BUILD_HALF as a compiler define for dynamic_arch builds
2020-05-01 09:58:30 +02:00
Martin Kroeker
2db5178e2d
enable cblas interfaces to GEMM3M in CMAKE builds
2020-04-22 11:01:28 +02:00