Martin Kroeker
aa259b141d
Merge pull request #4704 from amritahs-ibm/saxpy_perf_fix
...
Fix regression SAXPY when compiler with OpenXL compiler.
2024-05-18 19:11:25 +02:00
Chip Kerchner
3a1417671a
POWER: Fixing endianness issue in cswap/zswap kernel for AIX
2024-05-15 19:36:46 -05:00
Amrita H S
87b3d9054f
Fix regression SAXPY when compiler with OpenXL compiler.
...
SAXPY built with OpenXL regresses when compared to SAXPY
built with gcc. OpenXL compiler doesn't know that the
SAXPY inner kernel assembly is a 64 element loop and
to it the remainder loop is the main loop. It vectorizes
and interleaves the remainder to be a 48 elements per
iteration loop. With a max of 63 iterations, a 48 element
loop is mostly not going to get executed, so the 1 element
scalar loop that is the remainder after that is probably
mostly what gets executed.
This can be fixed by adding a pragma, loop interleave_count(2)
which will result in 8 element loop.
Signed-off-by: Amrita H S <amritahs@linux.vnet.ibm.com>
2024-05-12 23:27:55 -05:00
Martin Kroeker
8da6f7e5f2
Merge pull request #4686 from XiWeiGu/loongarch64_dgemm_kernel_16x6
...
Loongarch64: Improving the Performance and Stability of dgemm
2024-05-10 11:29:12 +02:00
gxw
f9a26240a7
loongarch64: Fixed icamax_lsx
2024-05-10 14:16:40 +08:00
gxw
cb0f707409
loongarch64: Fixed utest fork:safety
2024-05-10 14:16:36 +08:00
Martin Kroeker
b45d8e1ab2
remove stray comma
2024-05-09 12:33:19 +02:00
gxw
6017ad7146
loongarch64: Update dgemm_kernel_16x4 to dgemm_kernel_16x6
2024-05-08 10:10:26 +08:00
Martin Kroeker
992b71fea2
remove stray comma
2024-04-23 21:52:26 +02:00
Martin Kroeker
d421dec278
Merge pull request #4656 from zboszor/fix-x86-64-build-v2
...
Add forgotten conditional uses of PREFETCH
2024-04-23 21:05:08 +02:00
Martin Kroeker
ae695d4ca0
Merge pull request #4642 from XiWeiGu/loongarch64_clang
...
CI: Add clang test for loongarch64
2024-04-23 18:25:49 +02:00
gxw
7cd438a5ac
loongarch64: Fixed clang compilation issues
2024-04-23 19:19:11 +08:00
Zoltán Böszörményi
ca64861ce8
Add forgotten conditional uses of PREFETCH
...
This fixes a (cross-)compilation/linker error for PRESCOTT
on Yocto.
Signed-off-by: Zoltán Böszörményi <zoltan.boszormenyi@xenial.com>
2024-04-19 10:52:28 +02:00
gxw
9c39e969f5
mips64: Fixed MSA optimization bugs for zgemv and cgemv
2024-04-15 15:17:29 +08:00
Martin Kroeker
4c03ed437f
Fix SICORTEX ASUM/ZASUM and SUM/ZSUM for INCX <=0 ( #4640 )
...
* Exit early if INCX <= 0
2024-04-14 15:39:11 +02:00
Martin Kroeker
7cfd433d0c
revert the C/Z NRM2 kernels to the base NEON kernel as well
2024-04-12 15:34:04 +02:00
Martin Kroeker
93d975d8fd
Merge pull request #4593 from XiWeiGu/loongarch_add_buffer_offset
...
loongarch: Optimizing the performance of the GEMM on servers
2024-04-10 14:23:31 +02:00
gxw
d8c4ea8793
loongarch: Optimizing the performance of the GEMM on servers
2024-04-09 09:03:34 -04:00
Chen Yu
8e39c05efd
Get the l2 cache size via environment variable on confidential VM
...
The CPUID(leaf:2 or leaf:0x80000006) is not supported on some confidential
VMs. As a result the get_l2_size() returns the default 512M which brings
performance issues.
Introduce the environment variable OPENBLAS_L2_SIZE provided by the user
to get the l2 cache size.
Suggested-by: "Keshavamurthy, Anil S" <anil.s.keshavamurthy@intel.com>
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
2024-04-05 11:39:01 +08:00
Martin Kroeker
441c81026e
Add support for Cortex-A76
2024-04-02 19:41:44 +02:00
Martin Kroeker
9ead81bd39
Revert S/DNRM2 to the base NEON kernel to fix precision loss
2024-04-02 15:59:20 +02:00
gxw
96607cbb98
loongarch: Fixed dzamax
...
Initialize the registers to prevent sporadic errors.
2024-03-25 23:17:53 -04:00
gxw
50869f6ca8
loongarch: Fixed zrot LSX opt
2024-03-19 10:08:11 +08:00
gxw
b5eb9d6bac
loongarch: Fixed {sc/dz}amax LSX opt
2024-03-19 09:56:11 +08:00
gxw
ad13e04669
loongarch: Fixed {s/d/sc/dz}amin LSX opt
2024-03-19 09:18:44 +08:00
gxw
bbf82cb624
loongarch: Fixed {s/d}axpby LSX opt
2024-03-18 17:51:42 +08:00
gxw
ac460eb42a
loongarch: Fixed i{c/z}amin LSX opt
2024-03-18 17:15:58 +08:00
gxw
60e251a1f8
loongarch: Fixed {sc/dz}amax LASX opt
2024-03-16 14:52:17 +08:00
gxw
a10dde5554
loongarch: Fixed {s/d/sc/dz}amin LASX opt
2024-03-16 14:52:14 +08:00
gxw
6534d378b7
loongarch: Fixed {s/d/c/z}sum LASX opt
2024-03-16 14:52:10 +08:00
gxw
6159cffc58
loongarch: Fixed i{s/c/z}amin LASX opt
2024-03-16 14:52:06 +08:00
gxw
7d755912b9
loongarch: Fixed {s/d/c/z}axpby LASX opt
2024-03-16 14:51:56 +08:00
Martin Kroeker
cf80bd8500
Update nrm2_rvv.c
2024-03-13 13:07:26 +01:00
Martin Kroeker
9baa757905
Update nrm2_vector.c
2024-03-13 11:40:14 +01:00
Martin Kroeker
18a6db6862
Update nrm2_vector.c
2024-03-13 11:10:26 +01:00
Martin Kroeker
3752e73919
handle incx < 0
2024-03-12 20:44:01 +01:00
Martin Kroeker
db70c7f7fb
handle incx < 0
2024-03-12 20:42:11 +01:00
Martin Kroeker
dee8557d58
handle incx < 0
2024-03-12 20:40:29 +01:00
Martin Kroeker
d9dff17aec
handle incx < 0
2024-03-12 20:38:23 +01:00
Martin Kroeker
552c521353
remove another early exit for incx < 0
2024-03-12 18:49:27 +01:00
Martin Kroeker
ed532dc75b
remove another early exit for incx < 0
2024-03-12 18:47:00 +01:00
Martin Kroeker
6b89e1f1d7
fix loop condition for incx < 0
2024-03-12 15:49:41 +01:00
Martin Kroeker
20016a0096
fix loop condition for incx < 0
2024-03-12 15:48:55 +01:00
Martin Kroeker
09e84bd29a
fix loop condition for incx < 0
2024-03-12 15:48:00 +01:00
Martin Kroeker
f747aedb52
fix loop condition for incx < 0
2024-03-12 15:47:17 +01:00
Martin Kroeker
23796f8d31
fix loop condition for incx < 0
2024-03-12 15:46:23 +01:00
Martin Kroeker
bf93459746
fix loop condition for incx < 0
2024-03-12 15:45:23 +01:00
Martin Kroeker
e41d01bad9
remove early exit on negative inc_x
2024-03-11 22:53:54 +01:00
Martin Kroeker
02a025f9c1
remove early exit on negative inc_x
2024-03-11 22:52:18 +01:00
pengxu
680a77fafc
Optimized ssymv and dsymv kernel LSX for LoongArch
2024-03-05 20:36:59 +08:00