Fix regression SAXPY when compiler with OpenXL compiler.
SAXPY built with OpenXL regresses when compared to SAXPY built with gcc. OpenXL compiler doesn't know that the SAXPY inner kernel assembly is a 64 element loop and to it the remainder loop is the main loop. It vectorizes and interleaves the remainder to be a 48 elements per iteration loop. With a max of 63 iterations, a 48 element loop is mostly not going to get executed, so the 1 element scalar loop that is the remainder after that is probably mostly what gets executed. This can be fixed by adding a pragma, loop interleave_count(2) which will result in 8 element loop. Signed-off-by: Amrita H S <amritahs@linux.vnet.ibm.com>
This commit is contained in:
parent
a1d124094c
commit
d82494f484
|
@ -76,6 +76,9 @@ int CNAME(BLASLONG n, BLASLONG dummy0, BLASLONG dummy1, FLOAT da, FLOAT *x, BLAS
|
|||
saxpy_kernel_64(n1, &x[i], &y[i], da);
|
||||
|
||||
i += n1;
|
||||
#if defined(__clang__)
|
||||
#pragma clang loop interleave_count(2)
|
||||
#endif
|
||||
while(i < n)
|
||||
{
|
||||
|
||||
|
|
Loading…
Reference in New Issue