OpenBLAS/kernel
Amrita H S 87b3d9054f Fix regression SAXPY when compiler with OpenXL compiler.
SAXPY built with OpenXL regresses when compared to SAXPY
built with gcc. OpenXL compiler doesn't know that the
SAXPY inner kernel assembly is a 64 element loop and
to it the remainder loop is the main loop. It vectorizes
and interleaves the remainder to be a 48 elements per
iteration loop. With a max of 63 iterations, a 48 element
loop is mostly not going to get executed, so the 1 element
scalar loop that is the remainder after that is probably
mostly what gets executed.

This can be fixed by adding a pragma, loop interleave_count(2)
which will result in 8 element loop.

Signed-off-by: Amrita H S <amritahs@linux.vnet.ibm.com>
2024-05-12 23:27:55 -05:00
..
alpha alpha: Remove include of version.h 2022-08-11 15:02:58 +01:00
arm fix loop condition for incx < 0 2024-03-12 15:46:23 +01:00
arm64 revert the C/Z NRM2 kernels to the base NEON kernel as well 2024-04-12 15:34:04 +02:00
csky Add CSKY support 2024-01-16 23:45:06 +08:00
e2k Add default KERNEL file for Elbrus E2K arch 2022-01-22 18:59:36 +01:00
generic loongarch64: Update dgemm_kernel_16x4 to dgemm_kernel_16x6 2024-05-08 10:10:26 +08:00
ia64 Add ia64 implementation of ?sum 2019-03-30 22:18:03 +01:00
loongarch64 Merge pull request #4686 from XiWeiGu/loongarch64_dgemm_kernel_16x6 2024-05-10 11:29:12 +02:00
mips mips64: Fixed MSA optimization bugs for zgemv and cgemv 2024-04-15 15:17:29 +08:00
mips64 Fix SICORTEX ASUM/ZASUM and SUM/ZSUM for INCX <=0 (#4640) 2024-04-14 15:39:11 +02:00
power Fix regression SAXPY when compiler with OpenXL compiler. 2024-05-12 23:27:55 -05:00
riscv64 Update nrm2_rvv.c 2024-03-13 13:07:26 +01:00
simd fix the CI failure of lack the head 2020-11-12 17:35:17 +08:00
sparc Allow negative INCX (API change from version 3.10 of the reference implementation) 2023-08-10 16:58:57 +02:00
x86 Handle NAN 2024-01-08 16:11:25 +01:00
x86_64 Add forgotten conditional uses of PREFETCH 2024-04-19 10:52:28 +02:00
zarch Fix erroneous mapping of SUM kernels to ASUM 2024-02-27 11:28:50 +01:00
CMakeLists.txt Adding USE_GEMM3M macro to kernel targets, so that the *gemm3m functions and parameters can be included into the gotoblas structure. Fixes #4500 2024-02-12 02:29:58 +01:00
Makefile powerpc: Fix build errors with Open XL C 2023-10-04 14:04:03 -05:00
Makefile.L1 Conditionally add -mfma to compiler options where needed 2020-12-17 11:34:05 +01:00
Makefile.L2 make SSYMV available to BUILD_DOUBLE-only builds 2023-02-22 00:30:20 +01:00
Makefile.L3 (Re)apply fixes for supporting only a subset of precision types from PR 3915 2023-11-04 23:48:59 +01:00
Makefile.LA Support NO_LAPACK=1 to build the lib without LAPACK functions. 2011-03-04 11:51:32 +08:00
setparam-ref.c Merge pull request #4593 from XiWeiGu/loongarch_add_buffer_offset 2024-04-10 14:23:31 +02:00