OpenBLAS/kernel
Bart Oldeman c34e2cf380 Use _mm_set1_epi{32,64x} to init mask in x86-64 [cz]asum
for skylake kernels. This is the same method as used in [sd]asum.
_mm_set1_epi64x was commented out for zasum, but has the advantage
of avoiding possible undefined behaviour (using an uninitialized
variable), optimized out by NVHPC and icx. The new code works
fine with those compilers.

For GCC 12.3 the generated code is identical; no matter what method
you use, the compiler optimizes the code into a compile-time
constant, there is no performance benefit using mm_cmpeq_epi8
since the corresponding instruction (VPCMPEQB) isn't actually
generated!
2023-11-19 21:28:35 +00:00
..
alpha alpha: Remove include of version.h 2022-08-11 15:02:58 +01:00
arm Allow negative iNCX (API change from version 3.10 of the reference implementation) 2023-08-10 16:49:05 +02:00
arm64 Fix outdated SVE kernel definitions for Cortex cpus by aliasing to ARMV8SVE 2023-11-03 14:55:31 +01:00
e2k Add default KERNEL file for Elbrus E2K arch 2022-01-22 18:59:36 +01:00
generic Fix file permissions (issue 4095) 2023-07-23 20:37:07 +02:00
ia64 Add ia64 implementation of ?sum 2019-03-30 22:18:03 +01:00
loongarch64 LoongArch64: Fixed compilation issues when enable DYNAMIC_ARCH 2023-09-27 10:05:27 +08:00
mips Allow negative INCX (API change from version 3.10 of the reference implementation) 2023-08-10 16:52:09 +02:00
mips64 Allow negative INCX (API change from version 3.10 of the reference implementation) 2023-08-10 16:53:33 +02:00
power POWER: AIX: Make use of power10 optimization 2023-10-19 18:48:19 -05:00
riscv64 Allow negative INCX (API change from version 3.10 of the reference implementation) 2023-08-10 16:56:52 +02:00
simd fix the CI failure of lack the head 2020-11-12 17:35:17 +08:00
sparc Allow negative INCX (API change from version 3.10 of the reference implementation) 2023-08-10 16:58:57 +02:00
x86 Allow negative INCX (API change from version 3.10 of the reference implementation) 2023-08-10 17:00:18 +02:00
x86_64 Use _mm_set1_epi{32,64x} to init mask in x86-64 [cz]asum 2023-11-19 21:28:35 +00:00
zarch s390x: fix cscal and zscal implementations 2020-09-21 13:10:05 +02:00
CMakeLists.txt Fix dependencies in builds with specified subsets of precision types 2023-02-23 23:12:06 +01:00
Makefile powerpc: Fix build errors with Open XL C 2023-10-04 14:04:03 -05:00
Makefile.L1 Conditionally add -mfma to compiler options where needed 2020-12-17 11:34:05 +01:00
Makefile.L2 make SSYMV available to BUILD_DOUBLE-only builds 2023-02-22 00:30:20 +01:00
Makefile.L3 (Re)apply fixes for supporting only a subset of precision types from PR 3915 2023-11-04 23:48:59 +01:00
Makefile.LA Support NO_LAPACK=1 to build the lib without LAPACK functions. 2011-03-04 11:51:32 +08:00
setparam-ref.c Invoke the syscall to allow the use of amx tiles 2023-05-19 10:48:18 +08:00