Martin Kroeker
0cf656fd3e
Add copies of GEMMT under its new name GEMMTR
2024-10-30 12:55:14 +01:00
Chris Daley
cb48505251
optimize gemv forwarding on ARM64 systems
2024-10-24 21:05:26 -07:00
Chip Kerchner
36bd3eeddf
Vectorize BF16 GEMV (VSX & MMA). Use GEMM_GEMV_FORWARD_BF16 (for Power).
2024-10-13 13:46:11 -05:00
Chip Kerchner
1d51ca5798
Change multi-threading logic for SBGEMV to be the same as SGEMV.
2024-10-11 16:08:48 -05:00
Martin Kroeker
9762464718
Fix CBLAS interface filling in the wrong triangle for Row-Major
2024-10-09 18:06:39 +02:00
gxw
48698b2b1d
LoongArch64: Rename core
...
Use microarchitecture name instead of meaningless strings to name the core,
the legacy core is still retained.
1. Rename LOONGSONGENERIC to LA64_GENERIC
2. Rename LOONGSON3R5 to LA464
3. Rename LOONGSON2K1000 to LA264
2024-09-29 09:35:21 +08:00
Martin Kroeker
7878976236
disable forwarding from SBGEMM to SBGEMV for now
2024-08-08 18:03:38 +02:00
Chris Sidebottom
b26424c6a2
Allow opt into GEMM -> GEMV forwarding
2024-07-31 13:09:14 +01:00
Chris Sidebottom
90eb863d4b
Re-add accidental removal
2024-07-31 13:09:14 +01:00
Chris Sidebottom
28b5334f22
Complete implementation of GEMV forwarding
2024-07-31 13:09:14 +01:00
Martin Kroeker
3db5dbc88e
forward to GEMV when one argument is actually a vector
2024-07-31 13:09:14 +01:00
gxw
f3cebb3ca3
x86: Fixed numpy CI failure when the target is ZEN.
2024-07-12 16:09:30 +08:00
Martin Kroeker
2f12a47405
fix build options for CAXPYC/ZAXPYC
2024-06-09 20:32:10 +02:00
Martin Kroeker
db9f7bc552
fix float array types to include bfloat16
2024-06-03 00:22:16 +02:00
Martin Kroeker
076766df4e
Update CMakeLists.txt
2024-05-31 18:23:18 +02:00
Martin Kroeker
ff6670cb83
don't generate non-cblas files for gemm_batch
2024-05-30 18:26:02 +02:00
Martin Kroeker
362a063396
remove return value
2024-05-29 23:16:58 +02:00
Martin Kroeker
89c7bbcba6
add cblas_?gemm_batch
2024-05-29 15:47:02 +02:00
Martin Kroeker
2957281275
Introduce a lower limit for multithreading
2024-05-14 18:59:21 +02:00
Martin Kroeker
5fd871d7ea
Introduce a lower limit for multithreading
2024-05-14 18:48:03 +02:00
gxw
637c650f4f
loongarch64: Add buffer offset for target LOONGSON3R5
2024-05-10 11:42:53 +08:00
Martin Kroeker
93d975d8fd
Merge pull request #4593 from XiWeiGu/loongarch_add_buffer_offset
...
loongarch: Optimizing the performance of the GEMM on servers
2024-04-10 14:23:31 +02:00
gxw
d8c4ea8793
loongarch: Optimizing the performance of the GEMM on servers
2024-04-09 09:03:34 -04:00
Martin Kroeker
d277c6d15b
Merge pull request #4585 from martin-frbg/issue1881
...
Cap the number of parallel threads for GEMM;GETRF and POTRF to ensure sensible workloads on big systems
2024-04-03 18:35:16 +02:00
Igor Zhuravlov
22d305e2df
fix dtrtrs_ and ztrtrs_ to accept case-insensitive parameters uplo and diag
...
Changes to be committed:
modified: interface/lapack/trtrs.c
modified: interface/lapack/ztrtrs.c
2024-04-03 19:01:38 +10:00
Martin Kroeker
68ab5185d0
Update potrf.c
2024-03-27 22:10:01 +01:00
Martin Kroeker
19b29b3448
Update getrf.c
2024-03-27 22:09:30 +01:00
Martin Kroeker
a3354a7630
Cap the number of parallel threads
2024-03-27 22:00:30 +01:00
Martin Kroeker
5da4c93ef2
Cap the number of parallel threads
2024-03-27 20:34:55 +01:00
Martin Kroeker
496106642f
Cap the number of parallel threads
2024-03-27 20:32:11 +01:00
Martin Kroeker
cb8131cfd9
Merge pull request #4499 from kseniyazaytseva/new-tests
...
Tests for BLAS-like and BLAS API
2024-02-25 22:40:59 +01:00
Martin Kroeker
baf88564bc
Fix potential buffer overflow
2024-02-25 19:23:41 +01:00
kseniyazaytseva
7e9b1c0807
fix uninitialized data usage
2024-02-10 00:49:42 +03:00
kseniyazaytseva
c6f30fd414
check for zero inc
2024-02-10 00:48:07 +03:00
kseniyazaytseva
5e9ead09ac
fix info return
2024-02-10 00:47:25 +03:00
Martin Kroeker
500ac4de5e
fix incompatible pointer types
2024-02-08 13:18:34 +01:00
Martin Kroeker
d4db6a9f16
Separate the interface for SBGEMMT from GEMMT due to differences in GEMV arguments
2024-02-06 22:23:47 +01:00
Martin Kroeker
68d354814f
Fix incompatible pointer type in BFLOAT16 mode
2024-02-04 01:14:22 +01:00
Sergei Lewis
3ffd6868d7
Merge branch 'develop' into dev/slewis/merge-from-riscv
2024-02-01 11:29:41 +00:00
Martin Kroeker
47bd064763
Fix names in build rules
2024-01-31 20:49:43 +01:00
Martin Kroeker
a7d004e820
Fix CBLAS prototype
2024-01-31 17:55:42 +01:00
Martin Kroeker
b54cda8490
Unify creation of CBLAS interfaces for ?AMIN/?AMAX and C/ZAXPYC between gmake and cmake builds
2024-01-31 16:00:52 +01:00
Sergei Lewis
1093def0d1
Merge branch 'risc-v' into develop
2024-01-29 11:11:39 +00:00
kseniyazaytseva
f89e0034a4
Fix LAPACK usage from BLAS
2024-01-18 23:22:26 +03:00
Martin Kroeker
f7cf637d7a
redo lost edit
2024-01-18 23:22:26 +03:00
Martin Kroeker
85548e66ca
Fix build failures seen with the NO_LAPACK option - cspr/csymv/csyr belong on the LAPACK list
2024-01-18 23:22:26 +03:00
Martin Kroeker
f129161453
restore C/Z SPMV, SPR, SYR,SYMV
2024-01-18 23:22:26 +03:00
Martin Kroeker
5b4df851d7
fix stray blank on continuation line
2024-01-18 23:20:15 +03:00
kseniyazaytseva
ff41cf5c49
Fix BLAS, BLAS-like functions and Generic RISC-V kernels
...
* Fixed gemmt, imatcopy, zimatcopy_cnc functions
* Fixed cblas_cscal testing in ctest
* Removed rotmg unreacheble code
* Added zero size checks
2024-01-18 23:19:52 +03:00
Martin Kroeker
d2fc4f3b4d
Increase multithreading threshold by a factor of 50
2024-01-17 20:59:24 +01:00