277 Commits

Author SHA1 Message Date
Ayappan Perumal
020cce1068 Fix build issues with gcc compiler as well 2024-10-23 04:24:06 -05:00
Ayappan Perumal
b6ec73e77c Fix AIX build 2024-10-21 07:38:03 -05:00
Chip Kerchner
ab71a1edf2 Better VSX. 2024-10-17 08:25:02 -05:00
Chip Kerchner
36bd3eeddf Vectorize BF16 GEMV (VSX & MMA). Use GEMM_GEMV_FORWARD_BF16 (for Power). 2024-10-13 13:46:11 -05:00
Martin Kroeker
e52d9b4cf1 Merge pull request #4928 from austinpagan/czgemm_in_c
CGEMM & ZGEMM using C code, Power only, P10 only.
2024-10-09 20:26:21 +02:00
Gordon Fossum
0b7fb5c791 CGEMM & ZGEMM using C code. 2024-10-09 09:42:23 -05:00
Martin Kroeker
c9e92348a6 Handle inf/nan if dummy2 flag is set 2024-10-06 19:57:17 +02:00
Martin Kroeker
d714013ab9 change sgemm kernel to 4x4 as the 16x4 altivec goes out of bounds 2024-10-03 22:04:20 +02:00
Chip Kerchner
1a7b8c650d Merge branch 'develop' into betterPowerGEMVTail 2024-08-01 14:59:12 -05:00
Martin Kroeker
f5d04318e3 Merge branch 'OpenMathLib:develop' into scalfixes 2024-07-21 13:43:43 +02:00
Martin Kroeker
73f8866ffb make NAN handling depend on DUMMY2 parameter 2024-07-21 13:42:47 +02:00
Hong Bo Peng
db98f8753f Try to fix LAPACK testing failures on P7.
1. Remove the FADD insn from the GEMV Transpose code.
  2. Remove the FADD insn from GEMM and ZGEMM code.
  3. Reorder the compution of the Imaginary part in ZGEMM code.
2024-07-19 02:08:19 -04:00
Martin Kroeker
b9bfc8ce09 make NAN handling depend on dummy2 parameter 2024-07-17 23:29:50 +02:00
Chip Kerchner
ba47c7f4f3 Vectorize reduction stage of sgemv_t. 2024-07-16 15:57:24 -05:00
Chip Kerchner
cb154832f8 Vectorize SBGEMM incopy - 4x faster. 2024-07-09 13:10:03 -05:00
Martin Kroeker
2a5fe97e3b temporarily(?) disable the alpha=0 branch as it does not handle INF,NAN 2024-06-27 16:21:57 +02:00
Martin Kroeker
7f8f037a36 handle INF and NAN in input 2024-06-22 16:03:30 +02:00
Martin Kroeker
f1248b849d handle INF and NAN in input 2024-06-22 15:55:29 +02:00
Rajalakshmi Srinivasaraghavan
e112191b54 POWER: Fix issues in zscal to address lapack failures
This patch fixes following lapack failures with clang compiler on POWER.
zed.out: ZVX:   18 out of  5190 tests failed to pass the threshold
zgd.out: ZGV drivers:     25 out of   1092 tests failed to pass the threshold
zgd.out: ZGV drivers:      6 out of   1092 tests failed to pass the threshold
2024-05-22 08:00:06 -05:00
Martin Kroeker
aa259b141d Merge pull request #4704 from amritahs-ibm/saxpy_perf_fix
Fix regression SAXPY when compiler with OpenXL compiler.
2024-05-18 19:11:25 +02:00
Chip Kerchner
3a1417671a POWER: Fixing endianness issue in cswap/zswap kernel for AIX 2024-05-15 19:36:46 -05:00
Amrita H S
87b3d9054f Fix regression SAXPY when compiler with OpenXL compiler.
SAXPY built with OpenXL regresses when compared to SAXPY
built with gcc. OpenXL compiler doesn't know that the
SAXPY inner kernel assembly is a 64 element loop and
to it the remainder loop is the main loop. It vectorizes
and interleaves the remainder to be a 48 elements per
iteration loop. With a max of 63 iterations, a 48 element
loop is mostly not going to get executed, so the 1 element
scalar loop that is the remainder after that is probably
mostly what gets executed.

This can be fixed by adding a pragma, loop interleave_count(2)
which will result in 8 element loop.

Signed-off-by: Amrita H S <amritahs@linux.vnet.ibm.com>
2024-05-12 23:27:55 -05:00
Chip-Kerchner
99384933ff Revert "Merge pull request #4532 from austinpagan/cgemm_zgemm_c_code"
This reverts commit accea15551, reversing
changes made to b925353006.
2024-03-01 07:57:39 -06:00
Martin Kroeker
accea15551 Merge pull request #4532 from austinpagan/cgemm_zgemm_c_code
Cgemm zgemm c code
2024-02-27 22:07:07 +01:00
austinpagan
87ba528d8b Changed C files to straighten out indentation. Removed commented lines from other file. 2024-02-01 18:46:07 -06:00
austinpagan
ddac75e0ef Adding .C versions of CGEMM and ZGEMM 2024-02-01 12:24:25 -06:00
Chip Kerchner
2bb7ea64a1 Only vectorize 64-bit version for Power8. 2024-02-01 08:11:43 -06:00
Chip Kerchner
09bb48d1b9 Vectorize in-copy packing/copying for SGEMM - 4X faster. 2024-01-30 09:13:16 -06:00
Chip-Kerchner
058dd2a4cb Replace two vector loads with one vector pair load and fix endianess of stores - DGEMM versions. 2024-01-08 14:16:09 -06:00
barracuda156
d9653af018 KERNEL.PPC970, KERNEL.PPCG4: unbreak CMake parsing
Fixes: https://github.com/OpenMathLib/OpenBLAS/issues/4366
2023-12-14 12:00:11 +08:00
Chip-Kerchner
4e738e561a Replace two vector loads with one vector pair load and fix endianess of stores. 2023-12-08 12:36:08 -06:00
Rajalakshmi Srinivasaraghavan
980f702f72 POWER: AIX: Make use of power10 optimization
POWER10 optimizations are disabled when using default AIX assembler.
As we have fixed many issues recently, enabling optimization path
for default assembler.
2023-10-19 18:48:19 -05:00
Rajalakshmi Srinivasaraghavan
82fc29a57a POWER10: Fallback to POWER8 functions
As cgemm and zgemm kernels are not optimized for big endian falling
back to POWER8 versions.  Tested on AIX using gcc and Open XL C.
2023-10-11 17:04:42 -05:00
Martin Kroeker
8e6d93359d Merge pull request #4196 from TiborGY/obsolete_inlines
Modernize obsolete inline order
2023-09-03 14:12:42 +02:00
Ian McInerney
79c15db348 Fix power10 gcc intrinsic check
__builtin_vsx_assemble_pair was only in GCC 10-11.2 and was replaced by
__builtin_vsx_build_pair thereafter.
2023-08-17 15:05:29 +01:00
TGY
b5ba95a6c0 Modernize obsolete inline order 2023-08-16 00:48:40 +02:00
Martin Kroeker
54d3246fc6 Allow negative INCX (API change from version 3.10 of the reference implementation) 2023-08-10 16:55:17 +02:00
Manjul Mohan
58b88aa5f0 POWER10: Fix compiler warnings
This patch removes the warning messages related to unused variables in
sbgemm_kernel_power10.c.

Signed-off-by: Manjul Mohan <manjul@linux.vnet.ibm.com>
2023-06-12 01:08:59 -04:00
Martin Kroeker
1688c7da43 change line endings from CRLF to LF 2022-11-16 22:24:01 +01:00
Martin Kroeker
6c118b7977 Fix DNRM2 returning INF instead of zero due to intermediate overflow 2022-07-24 17:42:31 +02:00
Martin Kroeker
c43ec53bdd Merge pull request #3690 from RajalakshmiSR/cdotp10
POWER: Fix complex dot function failures
2022-07-19 13:59:16 +02:00
Rajalakshmi Srinivasaraghavan
a612e78a97 POWER: Fix complex dot function failures
There are some test failures in complex dot functions when compiling with gcc12.
The machine constraints used now do not update all the four elements in the
expected result array. Fixing this with a reduced level of optimization.
This is not changing any performance numbers but will be converted to C code in future.
2022-07-18 14:48:43 -05:00
Rajalakshmi Srinivasaraghavan
432fd99445 POWER10: dgemv builtin rename
Add check to use correct builtin name for older versions
of gcc10 compilers.
2022-07-18 09:48:01 -05:00
VFerrari
cac634fce3 POWER10: Fix multithreading check when USE_THREAD=0
This patch fixes an issue when OpenBLAS is compiled for TARGET=POWER10
and the flag USE_THREAD is set to 0.

The function `num_cpu_avail` is only available when USE_THREAD=1,
so SMP is defined.
2022-06-25 03:46:46 -03:00
Martin Kroeker
9283c7c0b5 Merge pull request #3655 from RajalakshmiSR/zgemmasmp10
POWER10: Fix ZGEMM testcase failures
2022-06-18 20:52:26 +02:00
Rajalakshmi Srinivasaraghavan
f191bc652b POWER10: Fix ZGEMM testcase failures
This patch fixes storing and restoring non volatile registers
in zgemm POWER10 kernel.
2022-06-17 08:18:08 -05:00
Rajalakshmi Srinivasaraghavan
8419d538ff POWER10: convert dgemv inline assembly
This patch makes use of compiler builtins and matches with assembly
performance. Tested with clang14 and gcc12.
2022-06-09 10:42:57 -05:00
Rajalakshmi Srinivasaraghavan
b62173c5a0 POWER10: Changing store instructions for Level1 functions
This patch changes 32 bytes stores to two 16 bytes stores
to fix a recent degradation due to 32 bytes stores.
2022-05-12 11:17:33 -05:00
Martin Kroeker
05dcfa176e fix undefined prefetchsizes 2022-04-16 10:04:27 +02:00
Martin Kroeker
2bbb9f05c7 fix undefined prefetchsize 2022-04-16 10:00:10 +02:00