OpenBLAS/kernel
Octavian Maghiar 4a12cf53ec [RISC-V] Improve RVV kernel generator LMUL usage
The RVV kernel generation script uses the provided LMUL to increase the number of accumulator registers.
Since the effect of the LMUL is to group together the vector registers into larger ones, it actually should be used as a multiplier in the calculation of vlenmax.
At the moment, no matter what LMUL is provided, the generated kernels would only set the maximum number of vector elements equal to VLEN/SEW.
Commit changes the use of LMUL to properly adjust vlenmax. Note that an increase in LMUL results in a decrease in the number of effective vector registers.
2023-12-04 11:13:35 +00:00
..
alpha alpha: Remove include of version.h 2022-08-11 15:02:58 +01:00
arm Typo fix 2021-02-23 13:14:35 +01:00
arm64 Wrap SVE header with __has_include check 2022-12-01 12:07:55 +00:00
e2k Add default KERNEL file for Elbrus E2K arch 2022-01-22 18:59:36 +01:00
generic * update intrinsics to match latest spec at https://github.com/riscv-non-isa/rvv-intrinsic-doc (in particular, __riscv_ prefixes for rvv intrinsics) 2023-02-24 10:45:03 +00:00
ia64 Add ia64 implementation of ?sum 2019-03-30 22:18:03 +01:00
loongarch64 LoongArch64: Add DYNAMIC_ARCH support 2022-07-28 14:28:45 +08:00
mips MIPS64: Fixed failed utest dsdot:dsdot_n_1 when TARGET=I6500 2022-09-17 16:43:22 +08:00
mips64 fix copyobj declarations to work with DYNAMIC_ARCH 2022-09-29 08:47:14 +02:00
power change line endings from CRLF to LF 2022-11-16 22:24:01 +01:00
riscv64 [RISC-V] Improve RVV kernel generator LMUL usage 2023-12-04 11:13:35 +00:00
simd fix the CI failure of lack the head 2020-11-12 17:35:17 +08:00
sparc fix DNRM2 returning INF instead of zero due to intermediate overflow 2022-07-19 10:19:27 +02:00
x86 initial support for Sapphire Rapids platform 2021-10-12 01:30:40 -07:00
x86_64 change line endings from CRLF to LF 2022-11-17 09:39:56 +01:00
zarch s390x: fix cscal and zscal implementations 2020-09-21 13:10:05 +02:00
CMakeLists.txt Fix generator rules for ?laswp_ncopy and ?neg_tcopy 2022-04-30 15:28:38 +02:00
Makefile Add -mfma to -mavx2 for Apple clang, and set AVX2 options for Zen as well 2022-09-13 22:39:27 +02:00
Makefile.L1 Conditionally add -mfma to compiler options where needed 2020-12-17 11:34:05 +01:00
Makefile.L2 Empirical workaround for numpy SVD NaN problem from issue 3318 2021-07-18 22:19:19 +02:00
Makefile.L3 Add Elbrus e2k architecture support 2022-01-22 18:55:10 +01:00
Makefile.LA Support NO_LAPACK=1 to build the lib without LAPACK functions. 2011-03-04 11:51:32 +08:00
setparam-ref.c Remove excess initializer (leftover from rework of PR 3793) 2022-10-31 16:57:03 +01:00