Fix regression SAXPY when compiler with OpenXL compiler.

SAXPY built with OpenXL regresses when compared to SAXPY
built with gcc. OpenXL compiler doesn't know that the
SAXPY inner kernel assembly is a 64 element loop and
to it the remainder loop is the main loop. It vectorizes
and interleaves the remainder to be a 48 elements per
iteration loop. With a max of 63 iterations, a 48 element
loop is mostly not going to get executed, so the 1 element
scalar loop that is the remainder after that is probably
mostly what gets executed.

This can be fixed by adding a pragma, loop interleave_count(2)
which will result in 8 element loop.

Signed-off-by: Amrita H S <amritahs@linux.vnet.ibm.com>
This commit is contained in:
Amrita H S 2024-05-07 11:31:36 -05:00
parent f0560f906f
commit 87b3d9054f
1 changed files with 3 additions and 0 deletions

View File

@ -76,6 +76,9 @@ int CNAME(BLASLONG n, BLASLONG dummy0, BLASLONG dummy1, FLOAT da, FLOAT *x, BLAS
saxpy_kernel_64(n1, &x[i], &y[i], da);
i += n1;
#if defined(__clang__)
#pragma clang loop interleave_count(2)
#endif
while(i < n)
{