Commit Graph

296 Commits

Author SHA1 Message Date
Chip Kerchner c8f53b85ce Merge remote-tracking branch 'origin/develop' into vectorizeBF16GEMV 2024-10-11 11:10:20 -05:00
Martin Kroeker e52d9b4cf1
Merge pull request #4928 from austinpagan/czgemm_in_c
CGEMM & ZGEMM using C code, Power only, P10 only.
2024-10-09 20:26:21 +02:00
Gordon Fossum 0b7fb5c791 CGEMM & ZGEMM using C code. 2024-10-09 09:42:23 -05:00
Chip Kerchner d6bb8dcfd1 Common code. 2024-10-06 14:13:43 -05:00
Martin Kroeker c9e92348a6
Handle inf/nan if dummy2 flag is set 2024-10-06 19:57:17 +02:00
Chip Kerchner 9ac0fb0111 Merge branch 'develop' into vectorizeBF16GEMV 2024-10-04 06:49:53 -05:00
Martin Kroeker d714013ab9
change sgemm kernel to 4x4 as the 16x4 altivec goes out of bounds 2024-10-03 22:04:20 +02:00
Chip Kerchner 915a6d6e44 Add casting. 2024-10-03 14:08:21 -05:00
Chip Kerchner 7ec3c16d82 Remove beta from optimized functions. 2024-10-03 13:27:33 -05:00
Chip Kerchner 7cc00f68c9 Remove more duplicate. 2024-10-01 11:23:32 -05:00
Chip Kerchner e238a68c03 Remove duplicate. 2024-10-01 11:06:23 -05:00
Chip Kerchner 32095b0cbb Remove parameter. 2024-10-01 09:32:42 -05:00
Chip Kerchner c8788208c8 Fixing block issue with transpose version. 2024-09-27 13:27:03 -05:00
Chip Kerchner d7c0d87cd1 Small changes. 2024-09-26 15:21:29 -05:00
Chip Kerchner eb6f3a05ef Common MMA code. 2024-09-26 09:28:56 -05:00
Chip Kerchner fb287d17fc Common code. 2024-09-25 16:31:36 -05:00
Chip Kerchner 8ab6245771 Small change. 2024-09-24 16:50:21 -05:00
Chip Kerchner df19375560 Almost final code for MMA. 2024-09-24 16:30:01 -05:00
Chip Kerchner 05aa63e738 More MMA BF16 GEMV code. 2024-09-24 12:54:02 -05:00
Chip Kerchner c9ce37d527 Force vector pairs in clang. 2024-09-23 08:43:58 -05:00
Chip Kerchner 89a12fa083 MMA BF16 GEMV code. 2024-09-23 06:32:14 -05:00
Chip Kerchner 7947970f9d Move common code. 2024-09-13 06:22:13 -05:00
Chip Kerchner 72216d28c2 Fix bug with inc_y adding results twice. 2024-09-11 08:47:32 -05:00
Chip Kerchner 2f142ee857 More common code. 2024-09-09 14:41:55 -05:00
Chip Kerchner 39fd29f1de Minor improvement and turn off BF16 GEMV forwarding by default. 2024-09-08 18:28:31 -05:00
Chip Kerchner 8541b25e1d Special case beta is one. 2024-09-06 14:48:48 -05:00
Chip Kerchner 76227e2948 Initial commit for vectorized BF16 GEMV. Added GEMM_GEMV_FORWARD_BF16 to enable using BF16 GEMV for one dimension matrices. Updated unit test to support inc_x != 1 or inc_y for GEMV. 2024-09-06 14:03:31 -05:00
Chip Kerchner 1a7b8c650d Merge branch 'develop' into betterPowerGEMVTail 2024-08-01 14:59:12 -05:00
Martin Kroeker f5d04318e3
Merge branch 'OpenMathLib:develop' into scalfixes 2024-07-21 13:43:43 +02:00
Martin Kroeker 73f8866ffb
make NAN handling depend on DUMMY2 parameter 2024-07-21 13:42:47 +02:00
Hong Bo Peng db98f8753f Try to fix LAPACK testing failures on P7.
1. Remove the FADD insn from the GEMV Transpose code.
  2. Remove the FADD insn from GEMM and ZGEMM code.
  3. Reorder the compution of the Imaginary part in ZGEMM code.
2024-07-19 02:08:19 -04:00
Martin Kroeker b9bfc8ce09
make NAN handling depend on dummy2 parameter 2024-07-17 23:29:50 +02:00
Chip Kerchner ba47c7f4f3 Vectorize reduction stage of sgemv_t. 2024-07-16 15:57:24 -05:00
Chip Kerchner cb154832f8 Vectorize SBGEMM incopy - 4x faster. 2024-07-09 13:10:03 -05:00
Martin Kroeker 2a5fe97e3b
temporarily(?) disable the alpha=0 branch as it does not handle INF,NAN 2024-06-27 16:21:57 +02:00
Martin Kroeker 7f8f037a36
handle INF and NAN in input 2024-06-22 16:03:30 +02:00
Martin Kroeker f1248b849d
handle INF and NAN in input 2024-06-22 15:55:29 +02:00
Rajalakshmi Srinivasaraghavan e112191b54 POWER: Fix issues in zscal to address lapack failures
This patch fixes following lapack failures with clang compiler on POWER.
zed.out: ZVX:   18 out of  5190 tests failed to pass the threshold
zgd.out: ZGV drivers:     25 out of   1092 tests failed to pass the threshold
zgd.out: ZGV drivers:      6 out of   1092 tests failed to pass the threshold
2024-05-22 08:00:06 -05:00
Martin Kroeker aa259b141d
Merge pull request #4704 from amritahs-ibm/saxpy_perf_fix
Fix regression SAXPY when compiler with OpenXL compiler.
2024-05-18 19:11:25 +02:00
Chip Kerchner 3a1417671a POWER: Fixing endianness issue in cswap/zswap kernel for AIX 2024-05-15 19:36:46 -05:00
Amrita H S 87b3d9054f Fix regression SAXPY when compiler with OpenXL compiler.
SAXPY built with OpenXL regresses when compared to SAXPY
built with gcc. OpenXL compiler doesn't know that the
SAXPY inner kernel assembly is a 64 element loop and
to it the remainder loop is the main loop. It vectorizes
and interleaves the remainder to be a 48 elements per
iteration loop. With a max of 63 iterations, a 48 element
loop is mostly not going to get executed, so the 1 element
scalar loop that is the remainder after that is probably
mostly what gets executed.

This can be fixed by adding a pragma, loop interleave_count(2)
which will result in 8 element loop.

Signed-off-by: Amrita H S <amritahs@linux.vnet.ibm.com>
2024-05-12 23:27:55 -05:00
Chip-Kerchner 99384933ff Revert "Merge pull request #4532 from austinpagan/cgemm_zgemm_c_code"
This reverts commit accea15551, reversing
changes made to b925353006.
2024-03-01 07:57:39 -06:00
Martin Kroeker accea15551
Merge pull request #4532 from austinpagan/cgemm_zgemm_c_code
Cgemm zgemm c code
2024-02-27 22:07:07 +01:00
austinpagan 87ba528d8b Changed C files to straighten out indentation. Removed commented lines from other file. 2024-02-01 18:46:07 -06:00
austinpagan ddac75e0ef Adding .C versions of CGEMM and ZGEMM 2024-02-01 12:24:25 -06:00
Chip Kerchner 2bb7ea64a1 Only vectorize 64-bit version for Power8. 2024-02-01 08:11:43 -06:00
Chip Kerchner 09bb48d1b9 Vectorize in-copy packing/copying for SGEMM - 4X faster. 2024-01-30 09:13:16 -06:00
Chip-Kerchner 058dd2a4cb Replace two vector loads with one vector pair load and fix endianess of stores - DGEMM versions. 2024-01-08 14:16:09 -06:00
barracuda156 d9653af018 KERNEL.PPC970, KERNEL.PPCG4: unbreak CMake parsing
Fixes: https://github.com/OpenMathLib/OpenBLAS/issues/4366
2023-12-14 12:00:11 +08:00
Chip-Kerchner 4e738e561a Replace two vector loads with one vector pair load and fix endianess of stores. 2023-12-08 12:36:08 -06:00