Commit Graph

2170 Commits

Author SHA1 Message Date
Martin Kroeker 8e872a91a9
Fix erroneous mapping of SUM kernels to ASUM 2024-02-27 11:28:50 +01:00
Martin Kroeker 6699227d45
Merge pull request #4525 from XiWeiGu/loongarch64_fixed_kernel_regress_skx_avx
LoongArch64: Fixed utest kernel_regress:skx_avx
2024-02-26 09:49:34 +01:00
gxw 8dea25ffff LoongArch64: Fixed utest kernel_regress:skx_avx 2024-02-26 02:04:37 -05:00
Martin Kroeker 7d506984fa
fix assignment of default CSUM kernel 2024-02-25 17:57:11 +01:00
Martin Kroeker 12787775d9
add csum/zsum kernels (trivially derived from the asum ones)s) 2024-02-25 17:55:36 +01:00
Martin Kroeker 8f8ef3492a
Add CSUM and ZSUM kernels (trivially derived from their existing ASUM counterparts) 2024-02-24 23:57:50 +01:00
Martin Kroeker be5e18c6f9
Add kernel definitions for CSUM and ZSUM 2024-02-24 23:55:43 +01:00
gxw 990507e3b8 LoongArch64: Opt zgemv with LASX 2024-02-22 11:58:02 +08:00
gxw d51ffec3a2 LoongArch64: Opt cgemv with LASX 2024-02-22 11:56:04 +08:00
pengxu 4787a55c64 Optimized cgemm kernel 16x4 LASX for LoongArch 2024-02-21 15:28:47 +08:00
Sergei Lewis ba17758c02 fix axpy implementations where y has a stride of 0 2024-02-16 16:00:38 +00:00
Dmitry Mikushin d0f5dc763b Adding USE_GEMM3M macro to kernel targets, so that the *gemm3m functions and parameters can be included into the gotoblas structure. Fixes #4500 2024-02-12 02:29:58 +01:00
Sergei Lewis ff1523163f Fix axpy test hangs when n==0. Reenable zaxpy_vector kernel for C910V. 2024-02-09 12:59:14 +00:00
pengxu fe3da43b7d Optimized zgemm kernel 8*4 LASX, 4*4 LSX and cgemm kernel 8*4 LSX for LoongArch 2024-02-06 11:49:01 +08:00
Martin Kroeker e5d2725e5a
Merge pull request #4185 from XiWeiGu/mips_enable_msa
MIPS: Enable MSA
2024-02-05 15:50:16 +01:00
Martin Kroeker b537528feb
Merge pull request #4480 from XiWeiGu/loongarch64-fixed-{s/d}amin-lsx
LoongArch64: Fixed {s/d}amin LSX optimization
2024-02-05 06:24:50 +01:00
Martin Kroeker 6d8a273cca
Handle zero increment(s) in C910V ?AXPBY (#4483)
* Handle zero increment(s)
2024-02-04 22:07:51 +01:00
Martin Kroeker dbcf4f8b7d
Merge pull request #4479 from XiWeiGu/loongarch-opt-axpby
Loongarch opt axpby
2024-02-04 19:50:28 +01:00
Martin Kroeker dc802dd637
Merge pull request #4474 from ChipKerchner/sgemmIncopy_PR
Vectorize in-copy packing/copying for SGEMM - up to 4X faster.
2024-02-04 18:51:09 +01:00
gxw adde725321 LoongArch64: Fixed {s/d}amin LSX optimization 2024-02-04 14:44:47 +08:00
gxw 7bc93d95a1 LoongArch64: Opt {c/z}axpby 2024-02-04 11:23:31 +08:00
gxw 1e1f487dc7 LoongArch64: Fixed {s/d}axpby 2024-02-04 09:41:37 +08:00
Martin Kroeker 4d8dee508c
temporarily disable the CAXPY/ZAXPY kernels 2024-02-04 01:05:03 +01:00
Chip Kerchner 2bb7ea64a1 Only vectorize 64-bit version for Power8. 2024-02-01 08:11:43 -06:00
Sergei Lewis 3ffd6868d7 Merge branch 'develop' into dev/slewis/merge-from-riscv 2024-02-01 11:29:41 +00:00
Sergei Lewis a3b0ef6596 Restore riscv64 fixes from develop branch: dot product double precision accumulation, zscal NaN handling 2024-02-01 10:32:00 +00:00
Martin Kroeker d1343302bd
Merge pull request #4465 from XiWeiGu/utest-zscal
utest: Add tests for zscal
2024-01-31 14:19:19 +01:00
gxw 969601a1dc X86_64: Fixed bug in zscal
Fixed handling of NAN and INF arguments when
inc is greater than 1.
2024-01-31 11:23:59 +08:00
Martin Kroeker 98c9ff3194
Merge pull request #4464 from XiWeiGu/loongarch64-zscal
LoongArch64: Handle NAN and INF
2024-01-30 22:53:29 +01:00
Chip Kerchner 09bb48d1b9 Vectorize in-copy packing/copying for SGEMM - 4X faster. 2024-01-30 09:13:16 -06:00
gxw 83ce97a4ca LoongArch64: Handle NAN and INF 2024-01-30 17:17:30 +08:00
gxw a79d117405 LoogArch64: Fixed bug for {s/d}amin 2024-01-30 11:32:57 +08:00
Sergei Lewis 1093def0d1 Merge branch 'risc-v' into develop 2024-01-29 11:11:39 +00:00
Martin Kroeker 889c5d026a
Merge pull request #4456 from kseniyazaytseva/riscv-rvv10
Fix BLAS and LAPACK tests for RVV 1.0 target, update to 0.12.0 intrincics
2024-01-26 13:31:09 +01:00
Martin Kroeker 4e2a32ff51
Merge pull request #4454 from kseniyazaytseva/riscv-rvv07
Fix BLAS and LAPACK tests for C910V and RISCV64_ZVL256B targets
2024-01-26 11:40:46 +01:00
gxw 276e3ebf9e LoongArch64: Add dzamax and dzamin opt 2024-01-26 10:03:50 +08:00
Martin Kroeker a21b2fa5e4
Merge pull request #4452 from kseniyazaytseva/riscv-generic
Fix BLAS, BLAS-like functions and Generic RISC-V kernels
2024-01-24 17:52:25 +01:00
Andrey Sokolov 9c49a81d54 Resolve conflicts 2024-01-23 19:08:53 +03:00
kseniyazaytseva e1afb23811 Fix BLAS and LAPACK tests for C910V and RISCV64_ZVL256B targets
* Fixed bugs in dgemm, [a]min\max, asum kernels
* Added zero checks for BLAS kernels
* Added dsdot implementation for RVV 0.7.1
* Fixed bugs in _vector files for C910V and RISCV64_ZVL256B targets
* Added additional definitions for RISCV64_ZVL256B target
2024-01-23 19:01:31 +03:00
Octavian Maghiar deecfb1a39 Merge branch 'risc-v' into img-riscv64-zvl128b 2024-01-19 12:26:38 +00:00
kseniyazaytseva 5222b5fc18 Added axpby kernels for GENERIC RISC-V target 2024-01-18 23:22:26 +03:00
kseniyazaytseva ff41cf5c49 Fix BLAS, BLAS-like functions and Generic RISC-V kernels
* Fixed gemmt, imatcopy, zimatcopy_cnc functions
* Fixed cblas_cscal testing in ctest
* Removed rotmg unreacheble code
* Added zero size checks
2024-01-18 23:19:52 +03:00
kseniyazaytseva b193ea3d7b Fix BLAS and LAPACK tests for RVV 1.0 target, update to 0.12.0 intrincics
* Update intrincics API to 0.12.0 version (Stride Segment Loads/Stores)
* Fixed nrm2, axpby, ncopy, zgemv and scal kernels
* Added zero size checks
2024-01-18 22:14:32 +03:00
Martin Kroeker 88e994116c
Merge pull request #4354 from imaginationtech/img-rvv-kernel-generator
[RISC-V] Improve RVV kernel generator LMUL usage
2024-01-17 15:19:37 +01:00
Dirreke ec89466e14 Add CSKY support 2024-01-16 23:45:06 +08:00
Sergei Lewis 9edb805e64 fix builds with t-head toolchains that use old versions of the intrinsics spec 2024-01-16 14:33:08 +00:00
Martin Kroeker 0d2e486edf
Handle NAN and INF 2024-01-15 11:18:59 +01:00
Martin Kroeker 5f5b7c4f45
Merge pull request #4423 from martin-frbg/issue4422
Check compiler support for AVX512BF16 and base COL/SPR kernel choice on that
2024-01-12 16:30:50 +01:00
Martin Kroeker f31bea07dd
Merge pull request #4419 from martin-frbg/issue4413
[WIP] Add fixes and utests for ZSCAL with NaN or Inf arguments
2024-01-12 14:27:08 +01:00
Martin Kroeker 20413ee6ec
Update zscal.c 2024-01-12 13:11:13 +01:00