Commit Graph

358 Commits

Author SHA1 Message Date
Martin Kroeker
d4db6a9f16 Separate the interface for SBGEMMT from GEMMT due to differences in GEMV arguments 2024-02-06 22:23:47 +01:00
Martin Kroeker
68d354814f Fix incompatible pointer type in BFLOAT16 mode 2024-02-04 01:14:22 +01:00
Sergei Lewis
3ffd6868d7 Merge branch 'develop' into dev/slewis/merge-from-riscv 2024-02-01 11:29:41 +00:00
Martin Kroeker
47bd064763 Fix names in build rules 2024-01-31 20:49:43 +01:00
Martin Kroeker
a7d004e820 Fix CBLAS prototype 2024-01-31 17:55:42 +01:00
Martin Kroeker
b54cda8490 Unify creation of CBLAS interfaces for ?AMIN/?AMAX and C/ZAXPYC between gmake and cmake builds 2024-01-31 16:00:52 +01:00
Sergei Lewis
1093def0d1 Merge branch 'risc-v' into develop 2024-01-29 11:11:39 +00:00
kseniyazaytseva
f89e0034a4 Fix LAPACK usage from BLAS 2024-01-18 23:22:26 +03:00
Martin Kroeker
f7cf637d7a redo lost edit 2024-01-18 23:22:26 +03:00
Martin Kroeker
85548e66ca Fix build failures seen with the NO_LAPACK option - cspr/csymv/csyr belong on the LAPACK list 2024-01-18 23:22:26 +03:00
Martin Kroeker
f129161453 restore C/Z SPMV, SPR, SYR,SYMV 2024-01-18 23:22:26 +03:00
Martin Kroeker
5b4df851d7 fix stray blank on continuation line 2024-01-18 23:20:15 +03:00
kseniyazaytseva
ff41cf5c49 Fix BLAS, BLAS-like functions and Generic RISC-V kernels
* Fixed gemmt, imatcopy, zimatcopy_cnc functions
* Fixed cblas_cscal testing in ctest
* Removed rotmg unreacheble code
* Added zero size checks
2024-01-18 23:19:52 +03:00
Martin Kroeker
d2fc4f3b4d Increase multithreading threshold by a factor of 50 2024-01-17 20:59:24 +01:00
Martin Kroeker
a7ed60bfe9 Add lower limit for multithreading 2023-12-21 20:05:23 +01:00
Angelika Schwarz
5ffbe646e1 Improve matcopy interface
* rows = 0 or cols = 0 is now a legal input and
  takes quick return path
* Follow BLAS/LAPACK convention that the leading
  dimensions must be at least 1.
2023-11-11 11:16:10 +01:00
Martin Kroeker
cd8eb83bae Fix allocations and compiler warnings in ZROTG (#4289)
* Clean up ZROTG
2023-11-05 18:13:37 +01:00
Martin Kroeker
4a0f86397b Merge pull request #4235 from angsch/develop
Fix division by zero in [z]rotg
2023-10-09 08:43:42 +02:00
Martin Kroeker
13ba4edf43 fix function prototypes (empty parentheses) 2023-09-30 12:53:35 +02:00
Martin Kroeker
b926e70ebd Fix typo in build rule of "profiled" sbgemm 2023-09-21 23:07:32 +02:00
Angelika Schwarz
db3a43c8ed Simplify rotg
* The check da != ZERO is no longer necessary since there
  is a special case ada == ZERO, where ada = |da|.
* Add the missing check c != ZERO before the division.

Note that with these two changes the long double code
follows the float/double version of the code.
2023-09-20 19:43:00 +02:00
Angelika Schwarz
6876ae0c3b Fix division by zero in zrotg
The cases
[ c  s ] * [ 0      ] = [ |db_i| ]
[-s  c ]   [ i*db_i ]   [  0     ]
and
[ c  s ] * [ 0      ] = [ |db_r| ]
[-s  c ]   [ db_r   ]   [  0     ]
computed s incorrectly. To flip the entries of vector,
s should be conjg(db)/|db| and not conjg(db) / da,
where da == 0.0.
2023-09-20 19:11:59 +02:00
Martin Kroeker
42909ce57d Merge branch 'xianyi:develop' into issue4130 2023-09-01 09:05:58 +02:00
Martin Kroeker
a2a184572c update zrotg 2023-08-31 23:42:12 +02:00
Martin Kroeker
214be14c1d Correct INFO returned for lda in non-CBLAS s/dgeadd 2023-08-18 22:48:30 +02:00
Martin Kroeker
4cc804c754 Prepare for INCX < 0 in new NRM2 implementation from BLAS 3.10 2023-08-09 16:13:23 +02:00
Martin Kroeker
04cdf5efb4 fix typo and missing declaration 2023-07-14 00:05:00 +02:00
Martin Kroeker
5e1103b8d7 Update rotg.c 2023-07-13 23:35:38 +02:00
Martin Kroeker
7c75c8b2fe fix truncated edit 2023-07-13 21:40:12 +02:00
Martin Kroeker
0f2ce93904 typo fix 2023-07-13 10:56:59 +02:00
Martin Kroeker
e08743d977 Update to use safe scaling algorithm from Reference-LAPACK PR 527 2023-07-12 23:02:36 +02:00
Martin Kroeker
7e93ab1b9e Fix info code returned for invalid ldb 2023-07-09 17:00:25 +02:00
Martin Kroeker
bb862b82d5 Fix integer overflow in multithreading threshold calculation for SYMM/SYRK (#4116)
* Fix potential integer overflow
2023-06-29 23:59:25 +02:00
Martin Kroeker
c3a2d407a0 Merge pull request #4048 from imzhuhl/spr_sbgemm_fix
Sapphire Rapids sbgemm fix
2023-06-17 20:47:09 +02:00
Angelika Schwarz
899c3a6f6a Improve input argument checks of gemmt
* Fix return value for invalid info
* Add missing checks for ldA, ldB
* Use reference-LAPACK like checks (ie ld=0,nrows=0 is invalid)
2023-05-26 08:51:27 +02:00
Honglin Zhu
71e4125795 Fix syscall error on non-x86 platform 2023-05-22 21:59:59 +08:00
Honglin Zhu
90f041e348 Invoke the syscall to allow the use of amx tiles 2023-05-19 10:48:18 +08:00
Ken Ho
df1b1f6a91 More detailed error message in [z]imatcopy.c. 2023-05-12 09:41:52 -07:00
Ken Ho
7a86c437b5 Change some "if" statements to "else if" following suggestion by @mmuetzel. 2023-05-10 09:13:04 -07:00
Ken Ho
33ab415f68 Bug fix and improvements for [z]imatcopy interface. 2023-05-08 14:43:56 -07:00
Martin Kroeker
1f6f7328eb remove redundant declaration 2023-04-27 09:14:12 +02:00
Martin Kroeker
7152d6b06d fix cblas_gemmt 2023-04-27 08:36:20 +02:00
Martin Kroeker
38d7a7b562 Fix ?GEMMT 2023-04-16 00:07:58 +02:00
Martin Kroeker
912d713b52 redo lost edit 2023-03-28 18:31:04 +02:00
Martin Kroeker
dc15c18efc Fix build failures seen with the NO_LAPACK option - cspr/csymv/csyr belong on the LAPACK list 2023-03-28 16:33:09 +02:00
H. Vetinari
f2659516ef remove unqualified ifdef's for NO_LAPACK(E) 2023-03-28 19:01:31 +11:00
Martin Kroeker
f2d6b1c70e Add multithreading threshold 2023-03-26 00:25:28 +01:00
Martin Kroeker
a495ffc554 Rework multithreading threshold 2023-03-26 00:23:57 +01:00
Martin Kroeker
244147495a Do not use multithreading for small workloads 2023-03-23 23:13:02 +01:00
Martin Kroeker
ab32f832a8 fix stray blank on continuation line 2023-03-21 08:29:05 +01:00