The following openblas_utest tests fail when the RISCV64_ZVL128B is
enabled.
TEST 89/103 axpby:zaxpby_inc_0 [FAIL]
TEST 92/103 axpby:caxpby_inc_0 [FAIL]
TEST 95/103 axpby:daxpby_inc_0 [FAIL]
TEST 98/103 axpby:saxpby_inc_0 [FAIL]
The issue is that the vectorized kernels do not work when inc_y == 0.
This patch updates the kernels to fall back to the scalar algorithms
when inc_y == 0, fixing the failing tests.
Signed-off-by: Mark Ryan <markdryan@rivosinc.com>
* Fixed bugs in dgemm, [a]min\max, asum kernels
* Added zero checks for BLAS kernels
* Added dsdot implementation for RVV 0.7.1
* Fixed bugs in _vector files for C910V and RISCV64_ZVL256B targets
* Added additional definitions for RISCV64_ZVL256B target
The RVV kernel generation script uses the provided LMUL to increase the number of accumulator registers.
Since the effect of the LMUL is to group together the vector registers into larger ones, it actually should be used as a multiplier in the calculation of vlenmax.
At the moment, no matter what LMUL is provided, the generated kernels would only set the maximum number of vector elements equal to VLEN/SEW.
Commit changes the use of LMUL to properly adjust vlenmax. Note that an increase in LMUL results in a decrease in the number of effective vector registers.
Current RVV x280 target depends on vlen=512-bits for Level 3 operations.
Commit adds generic target that supports vlen=128-bits.
New target uses the same scalable kernels as x280 for Level 1&2 operations, and autogenerated kernels for Level 3 operations.
Functional correctness of Level 3 operations tested on vlen=128-bits using QEMU v8.1.1 for ctests and BLAS-Tester.
During the last iteration of some RVV operations, accumulators can get overwritten when VL < VLMAX and tail policy is agnostic.
Commit changes intrinsics tail policy to undistrubed.
During the last iteration of some RVV operations, accumulators can get overwritten when VL < VLMAX and tail policy is agnostic.
Commit changes intrinsics tail policy to undistrubed.