Martin Kroeker
0d2e486edf
Handle NAN and INF
2024-01-15 11:18:59 +01:00
Martin Kroeker
5f5b7c4f45
Merge pull request #4423 from martin-frbg/issue4422
...
Check compiler support for AVX512BF16 and base COL/SPR kernel choice on that
2024-01-12 16:30:50 +01:00
Martin Kroeker
f31bea07dd
Merge pull request #4419 from martin-frbg/issue4413
...
[WIP] Add fixes and utests for ZSCAL with NaN or Inf arguments
2024-01-12 14:27:08 +01:00
Martin Kroeker
20413ee6ec
Update zscal.c
2024-01-12 13:11:13 +01:00
Martin Kroeker
b57627c27f
Handle NAN and INF
2024-01-12 12:03:08 +01:00
Martin Kroeker
995a990e24
Make AVX512 BFLOAT16 kernels conditional on compiler capability
2024-01-12 00:12:46 +01:00
Martin Kroeker
7df363e1e2
temporarily disable the MSA C/ZSCAL kernels
2024-01-12 00:08:52 +01:00
Chip-Kerchner
058dd2a4cb
Replace two vector loads with one vector pair load and fix endianess of stores - DGEMM versions.
2024-01-08 14:16:09 -06:00
Martin Kroeker
1c31f56e5a
Handle NAN
2024-01-08 16:11:25 +01:00
Martin Kroeker
7ee1ee38e2
Handle NaN in input
2024-01-08 14:20:07 +01:00
Martin Kroeker
f637e12713
Handle INF and NAN
2024-01-08 09:52:38 +01:00
Martin Kroeker
25b0c48082
Update zscal.c
2024-01-08 09:49:18 +01:00
Martin Kroeker
5e7f714e93
Update zscal.c
2024-01-08 08:17:40 +01:00
Martin Kroeker
cf8b03ae8b
Use NAN rather than SNAN for portability
2024-01-07 23:09:57 +01:00
Martin Kroeker
f0808d856b
Handle NAN in input
2024-01-07 20:27:29 +01:00
Martin Kroeker
acf17a825d
Handle NAN in input
2024-01-07 20:26:16 +01:00
Martin Kroeker
c9df62e883
Fix handling of NAN
2024-01-07 17:49:40 +01:00
Martin Kroeker
def4996170
Fix handling of NAN and INF arguments
2024-01-07 15:29:42 +01:00
Martin Kroeker
519b40fad9
Merge pull request #4398 from yinshiyou/la-dev
...
Add Optimizations for LoongArch.
2023-12-30 19:51:08 +01:00
pengxu
a5d0d21378
loongarch64: Add zgemm and cgemm optimization
2023-12-29 18:06:26 +08:00
gxw
546f13558c
loongarch64: Add {c/z}swap and {c/z}sum optimization
2023-12-29 17:30:57 +08:00
Hao Chen
edabb93668
loongarch64: Refine axpby optimization functions.
2023-12-29 17:30:57 +08:00
Hao Chen
1ec5dded43
loongarch64: Add c/zrot optimization functions.
...
Signed-off-by: Hao Chen <chenhao@loongson.cn>
2023-12-29 17:30:57 +08:00
Hao Chen
3c53ded315
loongarch64: Add c/znrm2 optimization functions.
2023-12-29 17:30:57 +08:00
Hao Chen
fbd612f8c4
loongarch64: Add ic/zamin optimization functions.
2023-12-29 17:30:57 +08:00
Hao Chen
d97272cb35
loongarch64: Add c/zdot optimization functions.
2023-12-29 17:30:57 +08:00
Hao Chen
65a0aeb128
loongarch64: Add c/zcopy optimization functions.
...
Signed-off-by: Hao Chen <chenhao@loongson.cn>
2023-12-29 17:30:57 +08:00
Hao Chen
2a34fb4b80
loongarch64: Add and refine scal optimization functions.
...
Signed-off-by: Hao Chen <chenhao@loongson.cn>
2023-12-29 17:30:57 +08:00
Hao Chen
8785e948b5
loongarch64: Add camin optimization function.
2023-12-29 17:30:57 +08:00
Hao Chen
0753848e03
loongarch64: Refine and add axpy optimization functions.
...
Signed-off-by: Hao Chen <chenhao@loongson.cn>
2023-12-29 17:30:57 +08:00
Hao Chen
06fd5b5995
loongarch64: Add and Refine asum optimization functions.
2023-12-29 17:30:57 +08:00
guxiwei
e771be185e
Optimize copy functions with lsx.
...
Signed-off-by: Hao Chen <chenhao@loongson.cn>
2023-12-29 17:30:57 +08:00
Hao Chen
179ed51d3b
Add dgemm_kernel_8x4.S file.
2023-12-29 17:30:57 +08:00
Hao Chen
173a65d4e6
loongarch64: Add and refine iamax optimization functions.
2023-12-29 17:30:57 +08:00
zhoupeng
ea70e165c7
loongarch64: Refine rot optimization.
2023-12-29 17:30:57 +08:00
zhoupeng
116aee7527
loongarch64: Refine imin optimization.
2023-12-29 17:30:57 +08:00
zhoupeng
8be2654193
loongarch64: Refine imax optimization.
2023-12-29 17:30:57 +08:00
zhoupeng
154baad454
loongarch64: Refine iamin optimization.
2023-12-29 17:30:57 +08:00
Shiyou Yin
36c12c4971
loongarch64: Refine copy,swap,nrm2,sum optimization.
2023-12-29 17:30:57 +08:00
Shiyou Yin
c6996a80e9
loongarch64: Refine amax,amin,max,min optimization.
2023-12-29 17:30:57 +08:00
Chris Sidebottom
ecae1389df
Reduce duplication in kernel definitions
...
These files are exactly the same, so I believe we can reduce these files
down. Other files require a slightly more complex unpicking.
2023-12-23 12:39:53 +00:00
Chris Sidebottom
60e66725e4
Use numeric labels to allow repeated inlining
2023-12-19 13:11:06 +00:00
Chris Sidebottom
7a4fef4f60
Tweak SVE dot kernel
...
This changes the SVE dot kernel to only predicate when necessary as well
as streamlining the assembly a bit. The benchmarks seem to indicate this
can improve performance by ~33%.
2023-12-19 12:08:54 +00:00
Martin Kroeker
f06b535566
Use C kernel for dgemv_t due to limitations of the old assembly one
2023-12-15 09:58:44 +01:00
barracuda156
d9653af018
KERNEL.PPC970, KERNEL.PPCG4: unbreak CMake parsing
...
Fixes: https://github.com/OpenMathLib/OpenBLAS/issues/4366
2023-12-14 12:00:11 +08:00
Chip-Kerchner
93747fb377
Merge remote-tracking branch 'origin/develop' into power10Copies
2023-12-12 09:32:49 -06:00
Chip-Kerchner
4e738e561a
Replace two vector loads with one vector pair load and fix endianess of stores.
2023-12-08 12:36:08 -06:00
yancheng
d32f38fb37
loongarch64: Add optimizations for nrm2.
2023-12-07 14:36:26 +08:00
yancheng
f9b468990e
loongarch64: Add optimizations for rot.
2023-12-07 14:36:26 +08:00
yancheng
c80e7e27d1
loongarch64: Add optimizations for sum and asum.
2023-12-07 14:36:26 +08:00