Commit Graph

394 Commits

Author SHA1 Message Date
Martin Kroeker 0cf656fd3e
Add copies of GEMMT under its new name GEMMTR 2024-10-30 12:55:14 +01:00
Chris Daley cb48505251 optimize gemv forwarding on ARM64 systems 2024-10-24 21:05:26 -07:00
Chip Kerchner 36bd3eeddf Vectorize BF16 GEMV (VSX & MMA). Use GEMM_GEMV_FORWARD_BF16 (for Power). 2024-10-13 13:46:11 -05:00
Chip Kerchner 1d51ca5798 Change multi-threading logic for SBGEMV to be the same as SGEMV. 2024-10-11 16:08:48 -05:00
Martin Kroeker 9762464718
Fix CBLAS interface filling in the wrong triangle for Row-Major 2024-10-09 18:06:39 +02:00
gxw 48698b2b1d LoongArch64: Rename core
Use microarchitecture name instead of meaningless strings to name the core,
the legacy core is still retained.
1. Rename LOONGSONGENERIC to LA64_GENERIC
2. Rename LOONGSON3R5 to LA464
3. Rename LOONGSON2K1000 to LA264
2024-09-29 09:35:21 +08:00
Martin Kroeker 7878976236
disable forwarding from SBGEMM to SBGEMV for now 2024-08-08 18:03:38 +02:00
Chris Sidebottom b26424c6a2 Allow opt into GEMM -> GEMV forwarding 2024-07-31 13:09:14 +01:00
Chris Sidebottom 90eb863d4b Re-add accidental removal 2024-07-31 13:09:14 +01:00
Chris Sidebottom 28b5334f22 Complete implementation of GEMV forwarding 2024-07-31 13:09:14 +01:00
Martin Kroeker 3db5dbc88e forward to GEMV when one argument is actually a vector 2024-07-31 13:09:14 +01:00
gxw f3cebb3ca3 x86: Fixed numpy CI failure when the target is ZEN. 2024-07-12 16:09:30 +08:00
Martin Kroeker 2f12a47405
fix build options for CAXPYC/ZAXPYC 2024-06-09 20:32:10 +02:00
Martin Kroeker db9f7bc552
fix float array types to include bfloat16 2024-06-03 00:22:16 +02:00
Martin Kroeker 076766df4e
Update CMakeLists.txt 2024-05-31 18:23:18 +02:00
Martin Kroeker ff6670cb83
don't generate non-cblas files for gemm_batch 2024-05-30 18:26:02 +02:00
Martin Kroeker 362a063396
remove return value 2024-05-29 23:16:58 +02:00
Martin Kroeker 89c7bbcba6
add cblas_?gemm_batch 2024-05-29 15:47:02 +02:00
Martin Kroeker 2957281275
Introduce a lower limit for multithreading 2024-05-14 18:59:21 +02:00
Martin Kroeker 5fd871d7ea
Introduce a lower limit for multithreading 2024-05-14 18:48:03 +02:00
gxw 637c650f4f loongarch64: Add buffer offset for target LOONGSON3R5 2024-05-10 11:42:53 +08:00
Martin Kroeker 93d975d8fd
Merge pull request #4593 from XiWeiGu/loongarch_add_buffer_offset
loongarch: Optimizing the performance of the GEMM on servers
2024-04-10 14:23:31 +02:00
gxw d8c4ea8793 loongarch: Optimizing the performance of the GEMM on servers 2024-04-09 09:03:34 -04:00
Martin Kroeker d277c6d15b
Merge pull request #4585 from martin-frbg/issue1881
Cap the number of parallel threads for GEMM;GETRF and POTRF to ensure sensible workloads on big systems
2024-04-03 18:35:16 +02:00
Igor Zhuravlov 22d305e2df fix dtrtrs_ and ztrtrs_ to accept case-insensitive parameters uplo and diag
Changes to be committed:
	modified:   interface/lapack/trtrs.c
	modified:   interface/lapack/ztrtrs.c
2024-04-03 19:01:38 +10:00
Martin Kroeker 68ab5185d0
Update potrf.c 2024-03-27 22:10:01 +01:00
Martin Kroeker 19b29b3448
Update getrf.c 2024-03-27 22:09:30 +01:00
Martin Kroeker a3354a7630
Cap the number of parallel threads 2024-03-27 22:00:30 +01:00
Martin Kroeker 5da4c93ef2
Cap the number of parallel threads 2024-03-27 20:34:55 +01:00
Martin Kroeker 496106642f
Cap the number of parallel threads 2024-03-27 20:32:11 +01:00
Martin Kroeker cb8131cfd9
Merge pull request #4499 from kseniyazaytseva/new-tests
Tests for BLAS-like and BLAS API
2024-02-25 22:40:59 +01:00
Martin Kroeker baf88564bc
Fix potential buffer overflow 2024-02-25 19:23:41 +01:00
kseniyazaytseva 7e9b1c0807 fix uninitialized data usage 2024-02-10 00:49:42 +03:00
kseniyazaytseva c6f30fd414 check for zero inc 2024-02-10 00:48:07 +03:00
kseniyazaytseva 5e9ead09ac fix info return 2024-02-10 00:47:25 +03:00
Martin Kroeker 500ac4de5e
fix incompatible pointer types 2024-02-08 13:18:34 +01:00
Martin Kroeker d4db6a9f16
Separate the interface for SBGEMMT from GEMMT due to differences in GEMV arguments 2024-02-06 22:23:47 +01:00
Martin Kroeker 68d354814f
Fix incompatible pointer type in BFLOAT16 mode 2024-02-04 01:14:22 +01:00
Sergei Lewis 3ffd6868d7 Merge branch 'develop' into dev/slewis/merge-from-riscv 2024-02-01 11:29:41 +00:00
Martin Kroeker 47bd064763
Fix names in build rules 2024-01-31 20:49:43 +01:00
Martin Kroeker a7d004e820
Fix CBLAS prototype 2024-01-31 17:55:42 +01:00
Martin Kroeker b54cda8490
Unify creation of CBLAS interfaces for ?AMIN/?AMAX and C/ZAXPYC between gmake and cmake builds 2024-01-31 16:00:52 +01:00
Sergei Lewis 1093def0d1 Merge branch 'risc-v' into develop 2024-01-29 11:11:39 +00:00
kseniyazaytseva f89e0034a4 Fix LAPACK usage from BLAS 2024-01-18 23:22:26 +03:00
Martin Kroeker f7cf637d7a redo lost edit 2024-01-18 23:22:26 +03:00
Martin Kroeker 85548e66ca Fix build failures seen with the NO_LAPACK option - cspr/csymv/csyr belong on the LAPACK list 2024-01-18 23:22:26 +03:00
Martin Kroeker f129161453 restore C/Z SPMV, SPR, SYR,SYMV 2024-01-18 23:22:26 +03:00
Martin Kroeker 5b4df851d7 fix stray blank on continuation line 2024-01-18 23:20:15 +03:00
kseniyazaytseva ff41cf5c49 Fix BLAS, BLAS-like functions and Generic RISC-V kernels
* Fixed gemmt, imatcopy, zimatcopy_cnc functions
* Fixed cblas_cscal testing in ctest
* Removed rotmg unreacheble code
* Added zero size checks
2024-01-18 23:19:52 +03:00
Martin Kroeker d2fc4f3b4d
Increase multithreading threshold by a factor of 50 2024-01-17 20:59:24 +01:00