Commit Graph

2200 Commits

Author SHA1 Message Date
Martin Kroeker cf80bd8500
Update nrm2_rvv.c 2024-03-13 13:07:26 +01:00
Martin Kroeker 9baa757905
Update nrm2_vector.c 2024-03-13 11:40:14 +01:00
Martin Kroeker 18a6db6862
Update nrm2_vector.c 2024-03-13 11:10:26 +01:00
Martin Kroeker 3752e73919
handle incx < 0 2024-03-12 20:44:01 +01:00
Martin Kroeker db70c7f7fb
handle incx < 0 2024-03-12 20:42:11 +01:00
Martin Kroeker dee8557d58
handle incx < 0 2024-03-12 20:40:29 +01:00
Martin Kroeker d9dff17aec
handle incx < 0 2024-03-12 20:38:23 +01:00
Martin Kroeker 552c521353
remove another early exit for incx < 0 2024-03-12 18:49:27 +01:00
Martin Kroeker ed532dc75b
remove another early exit for incx < 0 2024-03-12 18:47:00 +01:00
Martin Kroeker 6b89e1f1d7
fix loop condition for incx < 0 2024-03-12 15:49:41 +01:00
Martin Kroeker 20016a0096
fix loop condition for incx < 0 2024-03-12 15:48:55 +01:00
Martin Kroeker 09e84bd29a
fix loop condition for incx < 0 2024-03-12 15:48:00 +01:00
Martin Kroeker f747aedb52
fix loop condition for incx < 0 2024-03-12 15:47:17 +01:00
Martin Kroeker 23796f8d31
fix loop condition for incx < 0 2024-03-12 15:46:23 +01:00
Martin Kroeker bf93459746
fix loop condition for incx < 0 2024-03-12 15:45:23 +01:00
Martin Kroeker e41d01bad9
remove early exit on negative inc_x 2024-03-11 22:53:54 +01:00
Martin Kroeker 02a025f9c1
remove early exit on negative inc_x 2024-03-11 22:52:18 +01:00
pengxu 680a77fafc Optimized ssymv and dsymv kernel LSX for LoongArch 2024-03-05 20:36:59 +08:00
pengxu 6546600342 Optimized ssymv and dsymv kernel LASX for LoongArch 2024-03-04 16:18:39 +08:00
Chip-Kerchner 99384933ff Revert "Merge pull request #4532 from austinpagan/cgemm_zgemm_c_code"
This reverts commit accea15551, reversing
changes made to b925353006.
2024-03-01 07:57:39 -06:00
Martin Kroeker 577d480c62
Merge pull request #4529 from ErnstPeng/feature-branch
Optimized sgemv and dgemv kernel LSX for LoongArch
2024-02-28 13:49:54 +01:00
pengxu b2db064285 Optimized sgemv and dgemv kernel LSX for LoongArch 2024-02-28 18:07:27 +08:00
Martin Kroeker cfbb701497
Merge pull request #4536 from XiWeiGu/loongarch64-cgemv-zgemv-opt
Loongarch64 cgemv zgemv opt
2024-02-28 10:15:34 +01:00
gxw 8e05c053be LoongArch64:Fixed the failed test cases test_{c/z}gemv_n in test_extensions 2024-02-27 22:19:26 -05:00
gxw 3f22fc2233 LoongArch64: Add zgemv LSX opt 2024-02-27 22:19:04 -05:00
gxw c508a10cf2 LoongArch64: Add cgemv LSX opt 2024-02-27 22:17:30 -05:00
Martin Kroeker accea15551
Merge pull request #4532 from austinpagan/cgemm_zgemm_c_code
Cgemm zgemm c code
2024-02-27 22:07:07 +01:00
Martin Kroeker 8e872a91a9
Fix erroneous mapping of SUM kernels to ASUM 2024-02-27 11:28:50 +01:00
Martin Kroeker 6699227d45
Merge pull request #4525 from XiWeiGu/loongarch64_fixed_kernel_regress_skx_avx
LoongArch64: Fixed utest kernel_regress:skx_avx
2024-02-26 09:49:34 +01:00
gxw 8dea25ffff LoongArch64: Fixed utest kernel_regress:skx_avx 2024-02-26 02:04:37 -05:00
Martin Kroeker 7d506984fa
fix assignment of default CSUM kernel 2024-02-25 17:57:11 +01:00
Martin Kroeker 12787775d9
add csum/zsum kernels (trivially derived from the asum ones)s) 2024-02-25 17:55:36 +01:00
Martin Kroeker 8f8ef3492a
Add CSUM and ZSUM kernels (trivially derived from their existing ASUM counterparts) 2024-02-24 23:57:50 +01:00
Martin Kroeker be5e18c6f9
Add kernel definitions for CSUM and ZSUM 2024-02-24 23:55:43 +01:00
gxw 990507e3b8 LoongArch64: Opt zgemv with LASX 2024-02-22 11:58:02 +08:00
gxw d51ffec3a2 LoongArch64: Opt cgemv with LASX 2024-02-22 11:56:04 +08:00
pengxu 4787a55c64 Optimized cgemm kernel 16x4 LASX for LoongArch 2024-02-21 15:28:47 +08:00
Sergei Lewis ba17758c02 fix axpy implementations where y has a stride of 0 2024-02-16 16:00:38 +00:00
Dmitry Mikushin d0f5dc763b Adding USE_GEMM3M macro to kernel targets, so that the *gemm3m functions and parameters can be included into the gotoblas structure. Fixes #4500 2024-02-12 02:29:58 +01:00
Sergei Lewis ff1523163f Fix axpy test hangs when n==0. Reenable zaxpy_vector kernel for C910V. 2024-02-09 12:59:14 +00:00
pengxu fe3da43b7d Optimized zgemm kernel 8*4 LASX, 4*4 LSX and cgemm kernel 8*4 LSX for LoongArch 2024-02-06 11:49:01 +08:00
Martin Kroeker e5d2725e5a
Merge pull request #4185 from XiWeiGu/mips_enable_msa
MIPS: Enable MSA
2024-02-05 15:50:16 +01:00
Martin Kroeker b537528feb
Merge pull request #4480 from XiWeiGu/loongarch64-fixed-{s/d}amin-lsx
LoongArch64: Fixed {s/d}amin LSX optimization
2024-02-05 06:24:50 +01:00
Martin Kroeker 6d8a273cca
Handle zero increment(s) in C910V ?AXPBY (#4483)
* Handle zero increment(s)
2024-02-04 22:07:51 +01:00
Martin Kroeker dbcf4f8b7d
Merge pull request #4479 from XiWeiGu/loongarch-opt-axpby
Loongarch opt axpby
2024-02-04 19:50:28 +01:00
Martin Kroeker dc802dd637
Merge pull request #4474 from ChipKerchner/sgemmIncopy_PR
Vectorize in-copy packing/copying for SGEMM - up to 4X faster.
2024-02-04 18:51:09 +01:00
gxw adde725321 LoongArch64: Fixed {s/d}amin LSX optimization 2024-02-04 14:44:47 +08:00
gxw 7bc93d95a1 LoongArch64: Opt {c/z}axpby 2024-02-04 11:23:31 +08:00
gxw 1e1f487dc7 LoongArch64: Fixed {s/d}axpby 2024-02-04 09:41:37 +08:00
Martin Kroeker 4d8dee508c
temporarily disable the CAXPY/ZAXPY kernels 2024-02-04 01:05:03 +01:00