Commit Graph

2248 Commits

Author SHA1 Message Date
Martin Kroeker
3ec59922b6 Add a clobber list to fix utest errors seen with gcc13 on Apple M 2024-06-20 16:19:32 +02:00
Martin Kroeker
3d8054fb16 add clobber list 2024-06-14 22:07:44 +02:00
Martin Kroeker
dd7efcf9ef Avoid exceeding the configured thread count in x86_64 TOBF16 (#4748)
* avoid setting nthreads higher than available
2024-06-14 14:21:13 +02:00
Martin Kroeker
442dec28df Merge pull request #4738 from martin-frbg/issue4737
Disable GEMM3M for generic targets (not implemented)
2024-06-06 17:22:38 +02:00
Martin Kroeker
2787c9f8e4 Disable GEMM3M for generic targets (not implemented) 2024-06-06 14:39:50 +02:00
gxw
af73ae6208 LoongArch: Fixed issue 4728 2024-06-06 16:43:09 +08:00
gxw
8ab2e9ec65 LoongArch: DGEMM small matrix opt 2024-06-04 16:52:45 +08:00
Martin Kroeker
83bc8d5dd8 Merge pull request #4712 from RajalakshmiSR/zscalp10
POWER: Fix issues in zscal to address lapack failures
2024-06-01 11:22:08 +02:00
Martin Kroeker
8c05765a5a fix other corner cases where x=INF 2024-05-31 18:06:36 +02:00
Martin Kroeker
516743f7dc fix other instances of mishandling INF 2024-05-31 16:02:12 +02:00
Martin Kroeker
9ff4e9714e additional fixes for handling INF arguments 2024-05-31 15:44:07 +02:00
Martin Kroeker
ce130f11d2 Update zscal.c 2024-05-31 15:09:03 +02:00
Martin Kroeker
ab13cfef93 more fixes for infinite x 2024-05-31 14:34:49 +02:00
Martin Kroeker
ad2b5c67c8 fix another corner case involving infinity 2024-05-31 01:06:58 +02:00
Bart Oldeman
62f7b244ff Replace use of FLT_MAX in x86_64 zscal.c by isinf()
Commit def4996 fixed issues with inf and nan values in zscal,
but used FLT_MAX, where DBL_MAX or isinf() is more appropriate,
as FLT_MAX is for single precision only.
Using FLT_MAX caused test case failures in the LAPACK tests.

isinf() is consistent with the later fix 969601a1
2024-05-24 17:20:27 +00:00
Rajalakshmi Srinivasaraghavan
e112191b54 POWER: Fix issues in zscal to address lapack failures
This patch fixes following lapack failures with clang compiler on POWER.
zed.out: ZVX:   18 out of  5190 tests failed to pass the threshold
zgd.out: ZGV drivers:     25 out of   1092 tests failed to pass the threshold
zgd.out: ZGV drivers:      6 out of   1092 tests failed to pass the threshold
2024-05-22 08:00:06 -05:00
Martin Kroeker
aa259b141d Merge pull request #4704 from amritahs-ibm/saxpy_perf_fix
Fix regression SAXPY when compiler with OpenXL compiler.
2024-05-18 19:11:25 +02:00
Chip Kerchner
3a1417671a POWER: Fixing endianness issue in cswap/zswap kernel for AIX 2024-05-15 19:36:46 -05:00
Amrita H S
87b3d9054f Fix regression SAXPY when compiler with OpenXL compiler.
SAXPY built with OpenXL regresses when compared to SAXPY
built with gcc. OpenXL compiler doesn't know that the
SAXPY inner kernel assembly is a 64 element loop and
to it the remainder loop is the main loop. It vectorizes
and interleaves the remainder to be a 48 elements per
iteration loop. With a max of 63 iterations, a 48 element
loop is mostly not going to get executed, so the 1 element
scalar loop that is the remainder after that is probably
mostly what gets executed.

This can be fixed by adding a pragma, loop interleave_count(2)
which will result in 8 element loop.

Signed-off-by: Amrita H S <amritahs@linux.vnet.ibm.com>
2024-05-12 23:27:55 -05:00
Martin Kroeker
8da6f7e5f2 Merge pull request #4686 from XiWeiGu/loongarch64_dgemm_kernel_16x6
Loongarch64: Improving the Performance and Stability of dgemm
2024-05-10 11:29:12 +02:00
gxw
f9a26240a7 loongarch64: Fixed icamax_lsx 2024-05-10 14:16:40 +08:00
gxw
cb0f707409 loongarch64: Fixed utest fork:safety 2024-05-10 14:16:36 +08:00
Martin Kroeker
b45d8e1ab2 remove stray comma 2024-05-09 12:33:19 +02:00
gxw
6017ad7146 loongarch64: Update dgemm_kernel_16x4 to dgemm_kernel_16x6 2024-05-08 10:10:26 +08:00
Martin Kroeker
992b71fea2 remove stray comma 2024-04-23 21:52:26 +02:00
Martin Kroeker
d421dec278 Merge pull request #4656 from zboszor/fix-x86-64-build-v2
Add forgotten conditional uses of PREFETCH
2024-04-23 21:05:08 +02:00
Martin Kroeker
ae695d4ca0 Merge pull request #4642 from XiWeiGu/loongarch64_clang
CI: Add clang test for loongarch64
2024-04-23 18:25:49 +02:00
gxw
7cd438a5ac loongarch64: Fixed clang compilation issues 2024-04-23 19:19:11 +08:00
Zoltán Böszörményi
ca64861ce8 Add forgotten conditional uses of PREFETCH
This fixes a (cross-)compilation/linker error for PRESCOTT
on Yocto.

Signed-off-by: Zoltán Böszörményi <zoltan.boszormenyi@xenial.com>
2024-04-19 10:52:28 +02:00
gxw
9c39e969f5 mips64: Fixed MSA optimization bugs for zgemv and cgemv 2024-04-15 15:17:29 +08:00
Martin Kroeker
4c03ed437f Fix SICORTEX ASUM/ZASUM and SUM/ZSUM for INCX <=0 (#4640)
* Exit early if INCX <= 0
2024-04-14 15:39:11 +02:00
Martin Kroeker
7cfd433d0c revert the C/Z NRM2 kernels to the base NEON kernel as well 2024-04-12 15:34:04 +02:00
Martin Kroeker
93d975d8fd Merge pull request #4593 from XiWeiGu/loongarch_add_buffer_offset
loongarch: Optimizing the performance of the GEMM on servers
2024-04-10 14:23:31 +02:00
gxw
d8c4ea8793 loongarch: Optimizing the performance of the GEMM on servers 2024-04-09 09:03:34 -04:00
Chen Yu
8e39c05efd Get the l2 cache size via environment variable on confidential VM
The CPUID(leaf:2 or leaf:0x80000006) is not supported on some confidential
VMs. As a result the get_l2_size() returns the default 512M which brings
performance issues.

Introduce the environment variable OPENBLAS_L2_SIZE provided by the user
to get the l2 cache size.

Suggested-by: "Keshavamurthy, Anil S" <anil.s.keshavamurthy@intel.com>
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
2024-04-05 11:39:01 +08:00
Martin Kroeker
441c81026e Add support for Cortex-A76 2024-04-02 19:41:44 +02:00
Martin Kroeker
9ead81bd39 Revert S/DNRM2 to the base NEON kernel to fix precision loss 2024-04-02 15:59:20 +02:00
gxw
96607cbb98 loongarch: Fixed dzamax
Initialize the registers to prevent sporadic errors.
2024-03-25 23:17:53 -04:00
gxw
50869f6ca8 loongarch: Fixed zrot LSX opt 2024-03-19 10:08:11 +08:00
gxw
b5eb9d6bac loongarch: Fixed {sc/dz}amax LSX opt 2024-03-19 09:56:11 +08:00
gxw
ad13e04669 loongarch: Fixed {s/d/sc/dz}amin LSX opt 2024-03-19 09:18:44 +08:00
gxw
bbf82cb624 loongarch: Fixed {s/d}axpby LSX opt 2024-03-18 17:51:42 +08:00
gxw
ac460eb42a loongarch: Fixed i{c/z}amin LSX opt 2024-03-18 17:15:58 +08:00
gxw
60e251a1f8 loongarch: Fixed {sc/dz}amax LASX opt 2024-03-16 14:52:17 +08:00
gxw
a10dde5554 loongarch: Fixed {s/d/sc/dz}amin LASX opt 2024-03-16 14:52:14 +08:00
gxw
6534d378b7 loongarch: Fixed {s/d/c/z}sum LASX opt 2024-03-16 14:52:10 +08:00
gxw
6159cffc58 loongarch: Fixed i{s/c/z}amin LASX opt 2024-03-16 14:52:06 +08:00
gxw
7d755912b9 loongarch: Fixed {s/d/c/z}axpby LASX opt 2024-03-16 14:51:56 +08:00
Martin Kroeker
cf80bd8500 Update nrm2_rvv.c 2024-03-13 13:07:26 +01:00
Martin Kroeker
9baa757905 Update nrm2_vector.c 2024-03-13 11:40:14 +01:00