Commit Graph

2232 Commits

Author SHA1 Message Date
Martin Kroeker aa259b141d
Merge pull request #4704 from amritahs-ibm/saxpy_perf_fix
Fix regression SAXPY when compiler with OpenXL compiler.
2024-05-18 19:11:25 +02:00
Chip Kerchner 3a1417671a POWER: Fixing endianness issue in cswap/zswap kernel for AIX 2024-05-15 19:36:46 -05:00
Amrita H S 87b3d9054f Fix regression SAXPY when compiler with OpenXL compiler.
SAXPY built with OpenXL regresses when compared to SAXPY
built with gcc. OpenXL compiler doesn't know that the
SAXPY inner kernel assembly is a 64 element loop and
to it the remainder loop is the main loop. It vectorizes
and interleaves the remainder to be a 48 elements per
iteration loop. With a max of 63 iterations, a 48 element
loop is mostly not going to get executed, so the 1 element
scalar loop that is the remainder after that is probably
mostly what gets executed.

This can be fixed by adding a pragma, loop interleave_count(2)
which will result in 8 element loop.

Signed-off-by: Amrita H S <amritahs@linux.vnet.ibm.com>
2024-05-12 23:27:55 -05:00
Martin Kroeker 8da6f7e5f2
Merge pull request #4686 from XiWeiGu/loongarch64_dgemm_kernel_16x6
Loongarch64: Improving the Performance and Stability of dgemm
2024-05-10 11:29:12 +02:00
gxw f9a26240a7 loongarch64: Fixed icamax_lsx 2024-05-10 14:16:40 +08:00
gxw cb0f707409 loongarch64: Fixed utest fork:safety 2024-05-10 14:16:36 +08:00
Martin Kroeker b45d8e1ab2
remove stray comma 2024-05-09 12:33:19 +02:00
gxw 6017ad7146 loongarch64: Update dgemm_kernel_16x4 to dgemm_kernel_16x6 2024-05-08 10:10:26 +08:00
Martin Kroeker 992b71fea2
remove stray comma 2024-04-23 21:52:26 +02:00
Martin Kroeker d421dec278
Merge pull request #4656 from zboszor/fix-x86-64-build-v2
Add forgotten conditional uses of PREFETCH
2024-04-23 21:05:08 +02:00
Martin Kroeker ae695d4ca0
Merge pull request #4642 from XiWeiGu/loongarch64_clang
CI: Add clang test for loongarch64
2024-04-23 18:25:49 +02:00
gxw 7cd438a5ac loongarch64: Fixed clang compilation issues 2024-04-23 19:19:11 +08:00
Zoltán Böszörményi ca64861ce8 Add forgotten conditional uses of PREFETCH
This fixes a (cross-)compilation/linker error for PRESCOTT
on Yocto.

Signed-off-by: Zoltán Böszörményi <zoltan.boszormenyi@xenial.com>
2024-04-19 10:52:28 +02:00
gxw 9c39e969f5 mips64: Fixed MSA optimization bugs for zgemv and cgemv 2024-04-15 15:17:29 +08:00
Martin Kroeker 4c03ed437f
Fix SICORTEX ASUM/ZASUM and SUM/ZSUM for INCX <=0 (#4640)
* Exit early if INCX <= 0
2024-04-14 15:39:11 +02:00
Martin Kroeker 7cfd433d0c
revert the C/Z NRM2 kernels to the base NEON kernel as well 2024-04-12 15:34:04 +02:00
Martin Kroeker 93d975d8fd
Merge pull request #4593 from XiWeiGu/loongarch_add_buffer_offset
loongarch: Optimizing the performance of the GEMM on servers
2024-04-10 14:23:31 +02:00
gxw d8c4ea8793 loongarch: Optimizing the performance of the GEMM on servers 2024-04-09 09:03:34 -04:00
Chen Yu 8e39c05efd Get the l2 cache size via environment variable on confidential VM
The CPUID(leaf:2 or leaf:0x80000006) is not supported on some confidential
VMs. As a result the get_l2_size() returns the default 512M which brings
performance issues.

Introduce the environment variable OPENBLAS_L2_SIZE provided by the user
to get the l2 cache size.

Suggested-by: "Keshavamurthy, Anil S" <anil.s.keshavamurthy@intel.com>
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
2024-04-05 11:39:01 +08:00
Martin Kroeker 441c81026e
Add support for Cortex-A76 2024-04-02 19:41:44 +02:00
Martin Kroeker 9ead81bd39
Revert S/DNRM2 to the base NEON kernel to fix precision loss 2024-04-02 15:59:20 +02:00
gxw 96607cbb98 loongarch: Fixed dzamax
Initialize the registers to prevent sporadic errors.
2024-03-25 23:17:53 -04:00
gxw 50869f6ca8 loongarch: Fixed zrot LSX opt 2024-03-19 10:08:11 +08:00
gxw b5eb9d6bac loongarch: Fixed {sc/dz}amax LSX opt 2024-03-19 09:56:11 +08:00
gxw ad13e04669 loongarch: Fixed {s/d/sc/dz}amin LSX opt 2024-03-19 09:18:44 +08:00
gxw bbf82cb624 loongarch: Fixed {s/d}axpby LSX opt 2024-03-18 17:51:42 +08:00
gxw ac460eb42a loongarch: Fixed i{c/z}amin LSX opt 2024-03-18 17:15:58 +08:00
gxw 60e251a1f8 loongarch: Fixed {sc/dz}amax LASX opt 2024-03-16 14:52:17 +08:00
gxw a10dde5554 loongarch: Fixed {s/d/sc/dz}amin LASX opt 2024-03-16 14:52:14 +08:00
gxw 6534d378b7 loongarch: Fixed {s/d/c/z}sum LASX opt 2024-03-16 14:52:10 +08:00
gxw 6159cffc58 loongarch: Fixed i{s/c/z}amin LASX opt 2024-03-16 14:52:06 +08:00
gxw 7d755912b9 loongarch: Fixed {s/d/c/z}axpby LASX opt 2024-03-16 14:51:56 +08:00
Martin Kroeker cf80bd8500
Update nrm2_rvv.c 2024-03-13 13:07:26 +01:00
Martin Kroeker 9baa757905
Update nrm2_vector.c 2024-03-13 11:40:14 +01:00
Martin Kroeker 18a6db6862
Update nrm2_vector.c 2024-03-13 11:10:26 +01:00
Martin Kroeker 3752e73919
handle incx < 0 2024-03-12 20:44:01 +01:00
Martin Kroeker db70c7f7fb
handle incx < 0 2024-03-12 20:42:11 +01:00
Martin Kroeker dee8557d58
handle incx < 0 2024-03-12 20:40:29 +01:00
Martin Kroeker d9dff17aec
handle incx < 0 2024-03-12 20:38:23 +01:00
Martin Kroeker 552c521353
remove another early exit for incx < 0 2024-03-12 18:49:27 +01:00
Martin Kroeker ed532dc75b
remove another early exit for incx < 0 2024-03-12 18:47:00 +01:00
Martin Kroeker 6b89e1f1d7
fix loop condition for incx < 0 2024-03-12 15:49:41 +01:00
Martin Kroeker 20016a0096
fix loop condition for incx < 0 2024-03-12 15:48:55 +01:00
Martin Kroeker 09e84bd29a
fix loop condition for incx < 0 2024-03-12 15:48:00 +01:00
Martin Kroeker f747aedb52
fix loop condition for incx < 0 2024-03-12 15:47:17 +01:00
Martin Kroeker 23796f8d31
fix loop condition for incx < 0 2024-03-12 15:46:23 +01:00
Martin Kroeker bf93459746
fix loop condition for incx < 0 2024-03-12 15:45:23 +01:00
Martin Kroeker e41d01bad9
remove early exit on negative inc_x 2024-03-11 22:53:54 +01:00
Martin Kroeker 02a025f9c1
remove early exit on negative inc_x 2024-03-11 22:52:18 +01:00
pengxu 680a77fafc Optimized ssymv and dsymv kernel LSX for LoongArch 2024-03-05 20:36:59 +08:00