Commit Graph

2089 Commits

Author SHA1 Message Date
Martin Kroeker cf8b03ae8b
Use NAN rather than SNAN for portability 2024-01-07 23:09:57 +01:00
Martin Kroeker f0808d856b
Handle NAN in input 2024-01-07 20:27:29 +01:00
Martin Kroeker acf17a825d
Handle NAN in input 2024-01-07 20:26:16 +01:00
Martin Kroeker c9df62e883
Fix handling of NAN 2024-01-07 17:49:40 +01:00
Martin Kroeker def4996170
Fix handling of NAN and INF arguments 2024-01-07 15:29:42 +01:00
Martin Kroeker 519b40fad9
Merge pull request #4398 from yinshiyou/la-dev
Add Optimizations for LoongArch.
2023-12-30 19:51:08 +01:00
pengxu a5d0d21378 loongarch64: Add zgemm and cgemm optimization 2023-12-29 18:06:26 +08:00
gxw 546f13558c loongarch64: Add {c/z}swap and {c/z}sum optimization 2023-12-29 17:30:57 +08:00
Hao Chen edabb93668 loongarch64: Refine axpby optimization functions. 2023-12-29 17:30:57 +08:00
Hao Chen 1ec5dded43 loongarch64: Add c/zrot optimization functions.
Signed-off-by: Hao Chen <chenhao@loongson.cn>
2023-12-29 17:30:57 +08:00
Hao Chen 3c53ded315 loongarch64: Add c/znrm2 optimization functions. 2023-12-29 17:30:57 +08:00
Hao Chen fbd612f8c4 loongarch64: Add ic/zamin optimization functions. 2023-12-29 17:30:57 +08:00
Hao Chen d97272cb35 loongarch64: Add c/zdot optimization functions. 2023-12-29 17:30:57 +08:00
Hao Chen 65a0aeb128 loongarch64: Add c/zcopy optimization functions.
Signed-off-by: Hao Chen <chenhao@loongson.cn>
2023-12-29 17:30:57 +08:00
Hao Chen 2a34fb4b80 loongarch64: Add and refine scal optimization functions.
Signed-off-by: Hao Chen <chenhao@loongson.cn>
2023-12-29 17:30:57 +08:00
Hao Chen 8785e948b5 loongarch64: Add camin optimization function. 2023-12-29 17:30:57 +08:00
Hao Chen 0753848e03 loongarch64: Refine and add axpy optimization functions.
Signed-off-by: Hao Chen <chenhao@loongson.cn>
2023-12-29 17:30:57 +08:00
Hao Chen 06fd5b5995 loongarch64: Add and Refine asum optimization functions. 2023-12-29 17:30:57 +08:00
guxiwei e771be185e Optimize copy functions with lsx.
Signed-off-by: Hao Chen <chenhao@loongson.cn>
2023-12-29 17:30:57 +08:00
Hao Chen 179ed51d3b Add dgemm_kernel_8x4.S file. 2023-12-29 17:30:57 +08:00
Hao Chen 173a65d4e6 loongarch64: Add and refine iamax optimization functions. 2023-12-29 17:30:57 +08:00
zhoupeng ea70e165c7 loongarch64: Refine rot optimization. 2023-12-29 17:30:57 +08:00
zhoupeng 116aee7527 loongarch64: Refine imin optimization. 2023-12-29 17:30:57 +08:00
zhoupeng 8be2654193 loongarch64: Refine imax optimization. 2023-12-29 17:30:57 +08:00
zhoupeng 154baad454 loongarch64: Refine iamin optimization. 2023-12-29 17:30:57 +08:00
Shiyou Yin 36c12c4971 loongarch64: Refine copy,swap,nrm2,sum optimization. 2023-12-29 17:30:57 +08:00
Shiyou Yin c6996a80e9 loongarch64: Refine amax,amin,max,min optimization. 2023-12-29 17:30:57 +08:00
Chris Sidebottom ecae1389df Reduce duplication in kernel definitions
These files are exactly the same, so I believe we can reduce these files
down. Other files require a slightly more complex unpicking.
2023-12-23 12:39:53 +00:00
Chris Sidebottom 60e66725e4 Use numeric labels to allow repeated inlining 2023-12-19 13:11:06 +00:00
Chris Sidebottom 7a4fef4f60 Tweak SVE dot kernel
This changes the SVE dot kernel to only predicate when necessary as well
as streamlining the assembly a bit. The benchmarks seem to indicate this
can improve performance by ~33%.
2023-12-19 12:08:54 +00:00
Martin Kroeker f06b535566
Use C kernel for dgemv_t due to limitations of the old assembly one 2023-12-15 09:58:44 +01:00
barracuda156 d9653af018 KERNEL.PPC970, KERNEL.PPCG4: unbreak CMake parsing
Fixes: https://github.com/OpenMathLib/OpenBLAS/issues/4366
2023-12-14 12:00:11 +08:00
Chip-Kerchner 93747fb377 Merge remote-tracking branch 'origin/develop' into power10Copies 2023-12-12 09:32:49 -06:00
Chip-Kerchner 4e738e561a Replace two vector loads with one vector pair load and fix endianess of stores. 2023-12-08 12:36:08 -06:00
yancheng d32f38fb37 loongarch64: Add optimizations for nrm2. 2023-12-07 14:36:26 +08:00
yancheng f9b468990e loongarch64: Add optimizations for rot. 2023-12-07 14:36:26 +08:00
yancheng c80e7e27d1 loongarch64: Add optimizations for sum and asum. 2023-12-07 14:36:26 +08:00
yancheng d4c96a35a8 loongarch64: Add optimizations for axpy and axpby. 2023-12-07 14:36:26 +08:00
yancheng 360acc0a41 loongarch64: Add optimizations for swap. 2023-12-07 14:36:26 +08:00
yancheng 174c25766b loongarch64: Add optimizations for copy. 2023-12-07 14:36:26 +08:00
yancheng 49829b2b7d loongarch64: Add optimizations for iamin. 2023-12-07 14:36:07 +08:00
yancheng be83f5e4e0 loongarch64: Add optimizations for iamax. 2023-12-07 14:36:07 +08:00
yancheng e3fb2b5afa loongarch64: Add optimizations for imin. 2023-12-07 14:36:07 +08:00
yancheng e46b48e372 loongarch64: Add optimizations for imax. 2023-12-07 14:36:07 +08:00
yancheng 702fc1d56d loongarch64: Add optimization for min. 2023-12-07 14:36:07 +08:00
yancheng 346b384d1c loongarch64: Add optimization for max. 2023-12-07 14:36:07 +08:00
yancheng ff2ecc6cda loongarch64: Add optimization for amin. 2023-12-07 14:36:07 +08:00
yancheng 265b5f2e80 loongarch64: Add optimizations for amax. 2023-12-07 14:36:07 +08:00
yancheng 993ede7c70 loongarch64: Add optimizations for scal. 2023-12-07 14:36:07 +08:00
Martin Kroeker 39bf8ece20
Merge pull request #4340 from yinshiyou/la-dev
Add some refines and optimizations for LoongArch.
2023-11-29 08:22:25 +01:00