pengxu
a5d0d21378
loongarch64: Add zgemm and cgemm optimization
2023-12-29 18:06:26 +08:00
gxw
546f13558c
loongarch64: Add {c/z}swap and {c/z}sum optimization
2023-12-29 17:30:57 +08:00
Hao Chen
edabb93668
loongarch64: Refine axpby optimization functions.
2023-12-29 17:30:57 +08:00
Hao Chen
1ec5dded43
loongarch64: Add c/zrot optimization functions.
...
Signed-off-by: Hao Chen <chenhao@loongson.cn>
2023-12-29 17:30:57 +08:00
Hao Chen
3c53ded315
loongarch64: Add c/znrm2 optimization functions.
2023-12-29 17:30:57 +08:00
Hao Chen
fbd612f8c4
loongarch64: Add ic/zamin optimization functions.
2023-12-29 17:30:57 +08:00
Hao Chen
d97272cb35
loongarch64: Add c/zdot optimization functions.
2023-12-29 17:30:57 +08:00
Hao Chen
65a0aeb128
loongarch64: Add c/zcopy optimization functions.
...
Signed-off-by: Hao Chen <chenhao@loongson.cn>
2023-12-29 17:30:57 +08:00
Hao Chen
2a34fb4b80
loongarch64: Add and refine scal optimization functions.
...
Signed-off-by: Hao Chen <chenhao@loongson.cn>
2023-12-29 17:30:57 +08:00
Hao Chen
8785e948b5
loongarch64: Add camin optimization function.
2023-12-29 17:30:57 +08:00
Hao Chen
0753848e03
loongarch64: Refine and add axpy optimization functions.
...
Signed-off-by: Hao Chen <chenhao@loongson.cn>
2023-12-29 17:30:57 +08:00
Hao Chen
06fd5b5995
loongarch64: Add and Refine asum optimization functions.
2023-12-29 17:30:57 +08:00
guxiwei
e771be185e
Optimize copy functions with lsx.
...
Signed-off-by: Hao Chen <chenhao@loongson.cn>
2023-12-29 17:30:57 +08:00
Hao Chen
179ed51d3b
Add dgemm_kernel_8x4.S file.
2023-12-29 17:30:57 +08:00
Hao Chen
173a65d4e6
loongarch64: Add and refine iamax optimization functions.
2023-12-29 17:30:57 +08:00
zhoupeng
ea70e165c7
loongarch64: Refine rot optimization.
2023-12-29 17:30:57 +08:00
zhoupeng
116aee7527
loongarch64: Refine imin optimization.
2023-12-29 17:30:57 +08:00
zhoupeng
8be2654193
loongarch64: Refine imax optimization.
2023-12-29 17:30:57 +08:00
zhoupeng
154baad454
loongarch64: Refine iamin optimization.
2023-12-29 17:30:57 +08:00
Shiyou Yin
36c12c4971
loongarch64: Refine copy,swap,nrm2,sum optimization.
2023-12-29 17:30:57 +08:00
Shiyou Yin
c6996a80e9
loongarch64: Refine amax,amin,max,min optimization.
2023-12-29 17:30:57 +08:00
yancheng
d32f38fb37
loongarch64: Add optimizations for nrm2.
2023-12-07 14:36:26 +08:00
yancheng
f9b468990e
loongarch64: Add optimizations for rot.
2023-12-07 14:36:26 +08:00
yancheng
c80e7e27d1
loongarch64: Add optimizations for sum and asum.
2023-12-07 14:36:26 +08:00
yancheng
d4c96a35a8
loongarch64: Add optimizations for axpy and axpby.
2023-12-07 14:36:26 +08:00
yancheng
360acc0a41
loongarch64: Add optimizations for swap.
2023-12-07 14:36:26 +08:00
yancheng
174c25766b
loongarch64: Add optimizations for copy.
2023-12-07 14:36:26 +08:00
yancheng
49829b2b7d
loongarch64: Add optimizations for iamin.
2023-12-07 14:36:07 +08:00
yancheng
be83f5e4e0
loongarch64: Add optimizations for iamax.
2023-12-07 14:36:07 +08:00
yancheng
e3fb2b5afa
loongarch64: Add optimizations for imin.
2023-12-07 14:36:07 +08:00
yancheng
e46b48e372
loongarch64: Add optimizations for imax.
2023-12-07 14:36:07 +08:00
yancheng
702fc1d56d
loongarch64: Add optimization for min.
2023-12-07 14:36:07 +08:00
yancheng
346b384d1c
loongarch64: Add optimization for max.
2023-12-07 14:36:07 +08:00
yancheng
ff2ecc6cda
loongarch64: Add optimization for amin.
2023-12-07 14:36:07 +08:00
yancheng
265b5f2e80
loongarch64: Add optimizations for amax.
2023-12-07 14:36:07 +08:00
yancheng
993ede7c70
loongarch64: Add optimizations for scal.
2023-12-07 14:36:07 +08:00
Shiyou Yin
9fe07d82fd
loongarch: Add LSX optimization for dot.
2023-11-28 20:24:18 +08:00
Shiyou Yin
13b8c44b44
loongarch: Add optimization for dsdot kernel.
2023-11-28 20:24:16 +08:00
Shiyou Yin
3def6a8143
loongarch: Add LASX optimization for dot.
2023-11-28 20:24:14 +08:00
gxw
d15e0a055c
LoongArch64: Fixed compilation issues when enable DYNAMIC_ARCH
2023-09-27 10:05:27 +08:00
gxw
4670eb1462
LoongArch64: Add dtrsm kernel
2023-09-26 15:45:14 +08:00
gxw
f2cf929374
LoongArch64: Add sgemv kernel
2023-09-04 14:28:37 +08:00
gxw
394a1fd1bf
LoongArch64: Compatible with early internal toolchain
...
__loongarch_grlen and __loongarch_frlen were introduced in gcc version 8.3.0
(Loongnix 8.3.0-6.lnd.vec.31) internally within Loongson to standardize the
general and floating-point register widths. However, previous versions did
not have them, requiring additional checks to be added.
2023-08-31 16:55:29 +08:00
gxw
553cc1372f
LoongArch64: Add sgemm_kernel
2023-08-23 16:08:43 +08:00
Martin Kroeker
d15ffb7fdf
Allow negative INCX (API change from version 3.10 of the reference implementation)
2023-08-10 16:50:44 +02:00
Martin Kroeker
afdc56a421
Merge pull request #4158 from XiWeiGu/loongarch64_update_dgemm_kernel
...
LoongArch64: Update dgemm kernel
2023-08-07 12:44:09 +02:00
gxw
e8b571d245
LoongArch64: Add dgemv_t_8_lasx.S and dgemv_n_8_lasx.S V2
2023-08-07 11:20:42 +08:00
gxw
71fcee6eef
LoongArch64: Update dgemm kernel
2023-08-07 11:06:52 +08:00
Martin Kroeker
41c31bc1d4
Revert "LoongArch64: Add dgemv_t_8_lasx.S and dgemv_n_8_lasx.S"
2023-08-06 16:00:03 +02:00
Martin Kroeker
f8ee309402
Merge pull request #4153 from XiWeiGu/dgemv
...
LoongArch64: Add dgemv_t_8_lasx.S and dgemv_n_8_lasx.S
2023-08-06 08:49:16 +02:00