gxw
|
7bc93d95a1
|
LoongArch64: Opt {c/z}axpby
|
2024-02-04 11:23:31 +08:00 |
gxw
|
1e1f487dc7
|
LoongArch64: Fixed {s/d}axpby
|
2024-02-04 09:41:37 +08:00 |
Martin Kroeker
|
d1343302bd
|
Merge pull request #4465 from XiWeiGu/utest-zscal
utest: Add tests for zscal
|
2024-01-31 14:19:19 +01:00 |
gxw
|
969601a1dc
|
X86_64: Fixed bug in zscal
Fixed handling of NAN and INF arguments when
inc is greater than 1.
|
2024-01-31 11:23:59 +08:00 |
Martin Kroeker
|
98c9ff3194
|
Merge pull request #4464 from XiWeiGu/loongarch64-zscal
LoongArch64: Handle NAN and INF
|
2024-01-30 22:53:29 +01:00 |
gxw
|
83ce97a4ca
|
LoongArch64: Handle NAN and INF
|
2024-01-30 17:17:30 +08:00 |
gxw
|
a79d117405
|
LoogArch64: Fixed bug for {s/d}amin
|
2024-01-30 11:32:57 +08:00 |
gxw
|
276e3ebf9e
|
LoongArch64: Add dzamax and dzamin opt
|
2024-01-26 10:03:50 +08:00 |
Dirreke
|
ec89466e14
|
Add CSKY support
|
2024-01-16 23:45:06 +08:00 |
Martin Kroeker
|
0d2e486edf
|
Handle NAN and INF
|
2024-01-15 11:18:59 +01:00 |
Martin Kroeker
|
5f5b7c4f45
|
Merge pull request #4423 from martin-frbg/issue4422
Check compiler support for AVX512BF16 and base COL/SPR kernel choice on that
|
2024-01-12 16:30:50 +01:00 |
Martin Kroeker
|
f31bea07dd
|
Merge pull request #4419 from martin-frbg/issue4413
[WIP] Add fixes and utests for ZSCAL with NaN or Inf arguments
|
2024-01-12 14:27:08 +01:00 |
Martin Kroeker
|
20413ee6ec
|
Update zscal.c
|
2024-01-12 13:11:13 +01:00 |
Martin Kroeker
|
b57627c27f
|
Handle NAN and INF
|
2024-01-12 12:03:08 +01:00 |
Martin Kroeker
|
995a990e24
|
Make AVX512 BFLOAT16 kernels conditional on compiler capability
|
2024-01-12 00:12:46 +01:00 |
Martin Kroeker
|
7df363e1e2
|
temporarily disable the MSA C/ZSCAL kernels
|
2024-01-12 00:08:52 +01:00 |
Chip-Kerchner
|
058dd2a4cb
|
Replace two vector loads with one vector pair load and fix endianess of stores - DGEMM versions.
|
2024-01-08 14:16:09 -06:00 |
Martin Kroeker
|
1c31f56e5a
|
Handle NAN
|
2024-01-08 16:11:25 +01:00 |
Martin Kroeker
|
7ee1ee38e2
|
Handle NaN in input
|
2024-01-08 14:20:07 +01:00 |
Martin Kroeker
|
f637e12713
|
Handle INF and NAN
|
2024-01-08 09:52:38 +01:00 |
Martin Kroeker
|
25b0c48082
|
Update zscal.c
|
2024-01-08 09:49:18 +01:00 |
Martin Kroeker
|
5e7f714e93
|
Update zscal.c
|
2024-01-08 08:17:40 +01:00 |
Martin Kroeker
|
cf8b03ae8b
|
Use NAN rather than SNAN for portability
|
2024-01-07 23:09:57 +01:00 |
Martin Kroeker
|
f0808d856b
|
Handle NAN in input
|
2024-01-07 20:27:29 +01:00 |
Martin Kroeker
|
acf17a825d
|
Handle NAN in input
|
2024-01-07 20:26:16 +01:00 |
Martin Kroeker
|
c9df62e883
|
Fix handling of NAN
|
2024-01-07 17:49:40 +01:00 |
Martin Kroeker
|
def4996170
|
Fix handling of NAN and INF arguments
|
2024-01-07 15:29:42 +01:00 |
Martin Kroeker
|
519b40fad9
|
Merge pull request #4398 from yinshiyou/la-dev
Add Optimizations for LoongArch.
|
2023-12-30 19:51:08 +01:00 |
pengxu
|
a5d0d21378
|
loongarch64: Add zgemm and cgemm optimization
|
2023-12-29 18:06:26 +08:00 |
gxw
|
546f13558c
|
loongarch64: Add {c/z}swap and {c/z}sum optimization
|
2023-12-29 17:30:57 +08:00 |
Hao Chen
|
edabb93668
|
loongarch64: Refine axpby optimization functions.
|
2023-12-29 17:30:57 +08:00 |
Hao Chen
|
1ec5dded43
|
loongarch64: Add c/zrot optimization functions.
Signed-off-by: Hao Chen <chenhao@loongson.cn>
|
2023-12-29 17:30:57 +08:00 |
Hao Chen
|
3c53ded315
|
loongarch64: Add c/znrm2 optimization functions.
|
2023-12-29 17:30:57 +08:00 |
Hao Chen
|
fbd612f8c4
|
loongarch64: Add ic/zamin optimization functions.
|
2023-12-29 17:30:57 +08:00 |
Hao Chen
|
d97272cb35
|
loongarch64: Add c/zdot optimization functions.
|
2023-12-29 17:30:57 +08:00 |
Hao Chen
|
65a0aeb128
|
loongarch64: Add c/zcopy optimization functions.
Signed-off-by: Hao Chen <chenhao@loongson.cn>
|
2023-12-29 17:30:57 +08:00 |
Hao Chen
|
2a34fb4b80
|
loongarch64: Add and refine scal optimization functions.
Signed-off-by: Hao Chen <chenhao@loongson.cn>
|
2023-12-29 17:30:57 +08:00 |
Hao Chen
|
8785e948b5
|
loongarch64: Add camin optimization function.
|
2023-12-29 17:30:57 +08:00 |
Hao Chen
|
0753848e03
|
loongarch64: Refine and add axpy optimization functions.
Signed-off-by: Hao Chen <chenhao@loongson.cn>
|
2023-12-29 17:30:57 +08:00 |
Hao Chen
|
06fd5b5995
|
loongarch64: Add and Refine asum optimization functions.
|
2023-12-29 17:30:57 +08:00 |
guxiwei
|
e771be185e
|
Optimize copy functions with lsx.
Signed-off-by: Hao Chen <chenhao@loongson.cn>
|
2023-12-29 17:30:57 +08:00 |
Hao Chen
|
179ed51d3b
|
Add dgemm_kernel_8x4.S file.
|
2023-12-29 17:30:57 +08:00 |
Hao Chen
|
173a65d4e6
|
loongarch64: Add and refine iamax optimization functions.
|
2023-12-29 17:30:57 +08:00 |
zhoupeng
|
ea70e165c7
|
loongarch64: Refine rot optimization.
|
2023-12-29 17:30:57 +08:00 |
zhoupeng
|
116aee7527
|
loongarch64: Refine imin optimization.
|
2023-12-29 17:30:57 +08:00 |
zhoupeng
|
8be2654193
|
loongarch64: Refine imax optimization.
|
2023-12-29 17:30:57 +08:00 |
zhoupeng
|
154baad454
|
loongarch64: Refine iamin optimization.
|
2023-12-29 17:30:57 +08:00 |
Shiyou Yin
|
36c12c4971
|
loongarch64: Refine copy,swap,nrm2,sum optimization.
|
2023-12-29 17:30:57 +08:00 |
Shiyou Yin
|
c6996a80e9
|
loongarch64: Refine amax,amin,max,min optimization.
|
2023-12-29 17:30:57 +08:00 |
Chris Sidebottom
|
ecae1389df
|
Reduce duplication in kernel definitions
These files are exactly the same, so I believe we can reduce these files
down. Other files require a slightly more complex unpicking.
|
2023-12-23 12:39:53 +00:00 |