Fix regression SAXPY when compiler with OpenXL compiler.
SAXPY built with OpenXL regresses when compared to SAXPY built with gcc. OpenXL compiler doesn't know that the SAXPY inner kernel assembly is a 64 element loop and to it the remainder loop is the main loop. It vectorizes and interleaves the remainder to be a 48 elements per iteration loop. With a max of 63 iterations, a 48 element loop is mostly not going to get executed, so the 1 element scalar loop that is the remainder after that is probably mostly what gets executed. This can be fixed by adding a pragma, loop interleave_count(2) which will result in 8 element loop. Signed-off-by: Amrita H S <amritahs@linux.vnet.ibm.com>
This commit is contained in:
parent
f0560f906f
commit
87b3d9054f
|
@ -76,6 +76,9 @@ int CNAME(BLASLONG n, BLASLONG dummy0, BLASLONG dummy1, FLOAT da, FLOAT *x, BLAS
|
|||
saxpy_kernel_64(n1, &x[i], &y[i], da);
|
||||
|
||||
i += n1;
|
||||
#if defined(__clang__)
|
||||
#pragma clang loop interleave_count(2)
|
||||
#endif
|
||||
while(i < n)
|
||||
{
|
||||
|
||||
|
|
Loading…
Reference in New Issue