Commit Graph

58 Commits

Author SHA1 Message Date
pengxu a5d0d21378 loongarch64: Add zgemm and cgemm optimization 2023-12-29 18:06:26 +08:00
gxw 546f13558c loongarch64: Add {c/z}swap and {c/z}sum optimization 2023-12-29 17:30:57 +08:00
Hao Chen edabb93668 loongarch64: Refine axpby optimization functions. 2023-12-29 17:30:57 +08:00
Hao Chen 1ec5dded43 loongarch64: Add c/zrot optimization functions.
Signed-off-by: Hao Chen <chenhao@loongson.cn>
2023-12-29 17:30:57 +08:00
Hao Chen 3c53ded315 loongarch64: Add c/znrm2 optimization functions. 2023-12-29 17:30:57 +08:00
Hao Chen fbd612f8c4 loongarch64: Add ic/zamin optimization functions. 2023-12-29 17:30:57 +08:00
Hao Chen d97272cb35 loongarch64: Add c/zdot optimization functions. 2023-12-29 17:30:57 +08:00
Hao Chen 65a0aeb128 loongarch64: Add c/zcopy optimization functions.
Signed-off-by: Hao Chen <chenhao@loongson.cn>
2023-12-29 17:30:57 +08:00
Hao Chen 2a34fb4b80 loongarch64: Add and refine scal optimization functions.
Signed-off-by: Hao Chen <chenhao@loongson.cn>
2023-12-29 17:30:57 +08:00
Hao Chen 8785e948b5 loongarch64: Add camin optimization function. 2023-12-29 17:30:57 +08:00
Hao Chen 0753848e03 loongarch64: Refine and add axpy optimization functions.
Signed-off-by: Hao Chen <chenhao@loongson.cn>
2023-12-29 17:30:57 +08:00
Hao Chen 06fd5b5995 loongarch64: Add and Refine asum optimization functions. 2023-12-29 17:30:57 +08:00
guxiwei e771be185e Optimize copy functions with lsx.
Signed-off-by: Hao Chen <chenhao@loongson.cn>
2023-12-29 17:30:57 +08:00
Hao Chen 179ed51d3b Add dgemm_kernel_8x4.S file. 2023-12-29 17:30:57 +08:00
Hao Chen 173a65d4e6 loongarch64: Add and refine iamax optimization functions. 2023-12-29 17:30:57 +08:00
zhoupeng ea70e165c7 loongarch64: Refine rot optimization. 2023-12-29 17:30:57 +08:00
zhoupeng 116aee7527 loongarch64: Refine imin optimization. 2023-12-29 17:30:57 +08:00
zhoupeng 8be2654193 loongarch64: Refine imax optimization. 2023-12-29 17:30:57 +08:00
zhoupeng 154baad454 loongarch64: Refine iamin optimization. 2023-12-29 17:30:57 +08:00
Shiyou Yin 36c12c4971 loongarch64: Refine copy,swap,nrm2,sum optimization. 2023-12-29 17:30:57 +08:00
Shiyou Yin c6996a80e9 loongarch64: Refine amax,amin,max,min optimization. 2023-12-29 17:30:57 +08:00
yancheng d32f38fb37 loongarch64: Add optimizations for nrm2. 2023-12-07 14:36:26 +08:00
yancheng f9b468990e loongarch64: Add optimizations for rot. 2023-12-07 14:36:26 +08:00
yancheng c80e7e27d1 loongarch64: Add optimizations for sum and asum. 2023-12-07 14:36:26 +08:00
yancheng d4c96a35a8 loongarch64: Add optimizations for axpy and axpby. 2023-12-07 14:36:26 +08:00
yancheng 360acc0a41 loongarch64: Add optimizations for swap. 2023-12-07 14:36:26 +08:00
yancheng 174c25766b loongarch64: Add optimizations for copy. 2023-12-07 14:36:26 +08:00
yancheng 49829b2b7d loongarch64: Add optimizations for iamin. 2023-12-07 14:36:07 +08:00
yancheng be83f5e4e0 loongarch64: Add optimizations for iamax. 2023-12-07 14:36:07 +08:00
yancheng e3fb2b5afa loongarch64: Add optimizations for imin. 2023-12-07 14:36:07 +08:00
yancheng e46b48e372 loongarch64: Add optimizations for imax. 2023-12-07 14:36:07 +08:00
yancheng 702fc1d56d loongarch64: Add optimization for min. 2023-12-07 14:36:07 +08:00
yancheng 346b384d1c loongarch64: Add optimization for max. 2023-12-07 14:36:07 +08:00
yancheng ff2ecc6cda loongarch64: Add optimization for amin. 2023-12-07 14:36:07 +08:00
yancheng 265b5f2e80 loongarch64: Add optimizations for amax. 2023-12-07 14:36:07 +08:00
yancheng 993ede7c70 loongarch64: Add optimizations for scal. 2023-12-07 14:36:07 +08:00
Shiyou Yin 9fe07d82fd loongarch: Add LSX optimization for dot. 2023-11-28 20:24:18 +08:00
Shiyou Yin 13b8c44b44 loongarch: Add optimization for dsdot kernel. 2023-11-28 20:24:16 +08:00
Shiyou Yin 3def6a8143 loongarch: Add LASX optimization for dot. 2023-11-28 20:24:14 +08:00
gxw d15e0a055c LoongArch64: Fixed compilation issues when enable DYNAMIC_ARCH 2023-09-27 10:05:27 +08:00
gxw 4670eb1462 LoongArch64: Add dtrsm kernel 2023-09-26 15:45:14 +08:00
gxw f2cf929374 LoongArch64: Add sgemv kernel 2023-09-04 14:28:37 +08:00
gxw 394a1fd1bf LoongArch64: Compatible with early internal toolchain
__loongarch_grlen and __loongarch_frlen were introduced in gcc version 8.3.0
(Loongnix 8.3.0-6.lnd.vec.31) internally within Loongson to standardize the
general and floating-point register widths. However, previous versions did
not have them, requiring additional checks to be added.
2023-08-31 16:55:29 +08:00
gxw 553cc1372f LoongArch64: Add sgemm_kernel 2023-08-23 16:08:43 +08:00
Martin Kroeker d15ffb7fdf
Allow negative INCX (API change from version 3.10 of the reference implementation) 2023-08-10 16:50:44 +02:00
Martin Kroeker afdc56a421
Merge pull request #4158 from XiWeiGu/loongarch64_update_dgemm_kernel
LoongArch64: Update dgemm kernel
2023-08-07 12:44:09 +02:00
gxw e8b571d245 LoongArch64: Add dgemv_t_8_lasx.S and dgemv_n_8_lasx.S V2 2023-08-07 11:20:42 +08:00
gxw 71fcee6eef LoongArch64: Update dgemm kernel 2023-08-07 11:06:52 +08:00
Martin Kroeker 41c31bc1d4
Revert "LoongArch64: Add dgemv_t_8_lasx.S and dgemv_n_8_lasx.S" 2023-08-06 16:00:03 +02:00
Martin Kroeker f8ee309402
Merge pull request #4153 from XiWeiGu/dgemv
LoongArch64: Add dgemv_t_8_lasx.S and dgemv_n_8_lasx.S
2023-08-06 08:49:16 +02:00