2343 Commits

Author SHA1 Message Date
gxw
969601a1dc X86_64: Fixed bug in zscal
Fixed handling of NAN and INF arguments when
inc is greater than 1.
2024-01-31 11:23:59 +08:00
Martin Kroeker
98c9ff3194 Merge pull request #4464 from XiWeiGu/loongarch64-zscal
LoongArch64: Handle NAN and INF
2024-01-30 22:53:29 +01:00
Chip Kerchner
09bb48d1b9 Vectorize in-copy packing/copying for SGEMM - 4X faster. 2024-01-30 09:13:16 -06:00
gxw
83ce97a4ca LoongArch64: Handle NAN and INF 2024-01-30 17:17:30 +08:00
gxw
a79d117405 LoogArch64: Fixed bug for {s/d}amin 2024-01-30 11:32:57 +08:00
Sergei Lewis
1093def0d1 Merge branch 'risc-v' into develop 2024-01-29 11:11:39 +00:00
Martin Kroeker
889c5d026a Merge pull request #4456 from kseniyazaytseva/riscv-rvv10
Fix BLAS and LAPACK tests for RVV 1.0 target, update to 0.12.0 intrincics
2024-01-26 13:31:09 +01:00
Martin Kroeker
4e2a32ff51 Merge pull request #4454 from kseniyazaytseva/riscv-rvv07
Fix BLAS and LAPACK tests for C910V and RISCV64_ZVL256B targets
2024-01-26 11:40:46 +01:00
gxw
276e3ebf9e LoongArch64: Add dzamax and dzamin opt 2024-01-26 10:03:50 +08:00
Martin Kroeker
a21b2fa5e4 Merge pull request #4452 from kseniyazaytseva/riscv-generic
Fix BLAS, BLAS-like functions and Generic RISC-V kernels
2024-01-24 17:52:25 +01:00
Andrey Sokolov
9c49a81d54 Resolve conflicts 2024-01-23 19:08:53 +03:00
kseniyazaytseva
e1afb23811 Fix BLAS and LAPACK tests for C910V and RISCV64_ZVL256B targets
* Fixed bugs in dgemm, [a]min\max, asum kernels
* Added zero checks for BLAS kernels
* Added dsdot implementation for RVV 0.7.1
* Fixed bugs in _vector files for C910V and RISCV64_ZVL256B targets
* Added additional definitions for RISCV64_ZVL256B target
2024-01-23 19:01:31 +03:00
Octavian Maghiar
deecfb1a39 Merge branch 'risc-v' into img-riscv64-zvl128b 2024-01-19 12:26:38 +00:00
kseniyazaytseva
5222b5fc18 Added axpby kernels for GENERIC RISC-V target 2024-01-18 23:22:26 +03:00
kseniyazaytseva
ff41cf5c49 Fix BLAS, BLAS-like functions and Generic RISC-V kernels
* Fixed gemmt, imatcopy, zimatcopy_cnc functions
* Fixed cblas_cscal testing in ctest
* Removed rotmg unreacheble code
* Added zero size checks
2024-01-18 23:19:52 +03:00
kseniyazaytseva
b193ea3d7b Fix BLAS and LAPACK tests for RVV 1.0 target, update to 0.12.0 intrincics
* Update intrincics API to 0.12.0 version (Stride Segment Loads/Stores)
* Fixed nrm2, axpby, ncopy, zgemv and scal kernels
* Added zero size checks
2024-01-18 22:14:32 +03:00
Martin Kroeker
88e994116c Merge pull request #4354 from imaginationtech/img-rvv-kernel-generator
[RISC-V] Improve RVV kernel generator LMUL usage
2024-01-17 15:19:37 +01:00
Dirreke
ec89466e14 Add CSKY support 2024-01-16 23:45:06 +08:00
Sergei Lewis
9edb805e64 fix builds with t-head toolchains that use old versions of the intrinsics spec 2024-01-16 14:33:08 +00:00
Martin Kroeker
0d2e486edf Handle NAN and INF 2024-01-15 11:18:59 +01:00
Martin Kroeker
5f5b7c4f45 Merge pull request #4423 from martin-frbg/issue4422
Check compiler support for AVX512BF16 and base COL/SPR kernel choice on that
2024-01-12 16:30:50 +01:00
Martin Kroeker
f31bea07dd Merge pull request #4419 from martin-frbg/issue4413
[WIP] Add fixes and utests for ZSCAL with NaN or Inf arguments
2024-01-12 14:27:08 +01:00
Martin Kroeker
20413ee6ec Update zscal.c 2024-01-12 13:11:13 +01:00
Martin Kroeker
b57627c27f Handle NAN and INF 2024-01-12 12:03:08 +01:00
Martin Kroeker
995a990e24 Make AVX512 BFLOAT16 kernels conditional on compiler capability 2024-01-12 00:12:46 +01:00
Martin Kroeker
7df363e1e2 temporarily disable the MSA C/ZSCAL kernels 2024-01-12 00:08:52 +01:00
Chip-Kerchner
058dd2a4cb Replace two vector loads with one vector pair load and fix endianess of stores - DGEMM versions. 2024-01-08 14:16:09 -06:00
Martin Kroeker
1c31f56e5a Handle NAN 2024-01-08 16:11:25 +01:00
Martin Kroeker
7ee1ee38e2 Handle NaN in input 2024-01-08 14:20:07 +01:00
Martin Kroeker
f637e12713 Handle INF and NAN 2024-01-08 09:52:38 +01:00
Martin Kroeker
25b0c48082 Update zscal.c 2024-01-08 09:49:18 +01:00
Martin Kroeker
5e7f714e93 Update zscal.c 2024-01-08 08:17:40 +01:00
Martin Kroeker
cf8b03ae8b Use NAN rather than SNAN for portability 2024-01-07 23:09:57 +01:00
Martin Kroeker
f0808d856b Handle NAN in input 2024-01-07 20:27:29 +01:00
Martin Kroeker
acf17a825d Handle NAN in input 2024-01-07 20:26:16 +01:00
Martin Kroeker
c9df62e883 Fix handling of NAN 2024-01-07 17:49:40 +01:00
Martin Kroeker
def4996170 Fix handling of NAN and INF arguments 2024-01-07 15:29:42 +01:00
Martin Kroeker
519b40fad9 Merge pull request #4398 from yinshiyou/la-dev
Add Optimizations for LoongArch.
2023-12-30 19:51:08 +01:00
pengxu
a5d0d21378 loongarch64: Add zgemm and cgemm optimization 2023-12-29 18:06:26 +08:00
gxw
546f13558c loongarch64: Add {c/z}swap and {c/z}sum optimization 2023-12-29 17:30:57 +08:00
Hao Chen
edabb93668 loongarch64: Refine axpby optimization functions. 2023-12-29 17:30:57 +08:00
Hao Chen
1ec5dded43 loongarch64: Add c/zrot optimization functions.
Signed-off-by: Hao Chen <chenhao@loongson.cn>
2023-12-29 17:30:57 +08:00
Hao Chen
3c53ded315 loongarch64: Add c/znrm2 optimization functions. 2023-12-29 17:30:57 +08:00
Hao Chen
fbd612f8c4 loongarch64: Add ic/zamin optimization functions. 2023-12-29 17:30:57 +08:00
Hao Chen
d97272cb35 loongarch64: Add c/zdot optimization functions. 2023-12-29 17:30:57 +08:00
Hao Chen
65a0aeb128 loongarch64: Add c/zcopy optimization functions.
Signed-off-by: Hao Chen <chenhao@loongson.cn>
2023-12-29 17:30:57 +08:00
Hao Chen
2a34fb4b80 loongarch64: Add and refine scal optimization functions.
Signed-off-by: Hao Chen <chenhao@loongson.cn>
2023-12-29 17:30:57 +08:00
Hao Chen
8785e948b5 loongarch64: Add camin optimization function. 2023-12-29 17:30:57 +08:00
Hao Chen
0753848e03 loongarch64: Refine and add axpy optimization functions.
Signed-off-by: Hao Chen <chenhao@loongson.cn>
2023-12-29 17:30:57 +08:00
Hao Chen
06fd5b5995 loongarch64: Add and Refine asum optimization functions. 2023-12-29 17:30:57 +08:00