Chip Kerchner
c8788208c8
Fixing block issue with transpose version.
2024-09-27 13:27:03 -05:00
Chip Kerchner
d7c0d87cd1
Small changes.
2024-09-26 15:21:29 -05:00
Chip Kerchner
eb6f3a05ef
Common MMA code.
2024-09-26 09:28:56 -05:00
Chip Kerchner
fb287d17fc
Common code.
2024-09-25 16:31:36 -05:00
Chip Kerchner
8ab6245771
Small change.
2024-09-24 16:50:21 -05:00
Chip Kerchner
df19375560
Almost final code for MMA.
2024-09-24 16:30:01 -05:00
Chip Kerchner
05aa63e738
More MMA BF16 GEMV code.
2024-09-24 12:54:02 -05:00
Chip Kerchner
c9ce37d527
Force vector pairs in clang.
2024-09-23 08:43:58 -05:00
Chip Kerchner
89a12fa083
MMA BF16 GEMV code.
2024-09-23 06:32:14 -05:00
Chip Kerchner
7947970f9d
Move common code.
2024-09-13 06:22:13 -05:00
Chip Kerchner
72216d28c2
Fix bug with inc_y adding results twice.
2024-09-11 08:47:32 -05:00
Chip Kerchner
2f142ee857
More common code.
2024-09-09 14:41:55 -05:00
Chip Kerchner
39fd29f1de
Minor improvement and turn off BF16 GEMV forwarding by default.
2024-09-08 18:28:31 -05:00
Chip Kerchner
8541b25e1d
Special case beta is one.
2024-09-06 14:48:48 -05:00
Chip Kerchner
76227e2948
Initial commit for vectorized BF16 GEMV. Added GEMM_GEMV_FORWARD_BF16 to enable using BF16 GEMV for one dimension matrices. Updated unit test to support inc_x != 1 or inc_y for GEMV.
2024-09-06 14:03:31 -05:00
Chip Kerchner
1a7b8c650d
Merge branch 'develop' into betterPowerGEMVTail
2024-08-01 14:59:12 -05:00
Martin Kroeker
f5d04318e3
Merge branch 'OpenMathLib:develop' into scalfixes
2024-07-21 13:43:43 +02:00
Martin Kroeker
73f8866ffb
make NAN handling depend on DUMMY2 parameter
2024-07-21 13:42:47 +02:00
Hong Bo Peng
db98f8753f
Try to fix LAPACK testing failures on P7.
...
1. Remove the FADD insn from the GEMV Transpose code.
2. Remove the FADD insn from GEMM and ZGEMM code.
3. Reorder the compution of the Imaginary part in ZGEMM code.
2024-07-19 02:08:19 -04:00
Martin Kroeker
b9bfc8ce09
make NAN handling depend on dummy2 parameter
2024-07-17 23:29:50 +02:00
Chip Kerchner
ba47c7f4f3
Vectorize reduction stage of sgemv_t.
2024-07-16 15:57:24 -05:00
Chip Kerchner
cb154832f8
Vectorize SBGEMM incopy - 4x faster.
2024-07-09 13:10:03 -05:00
Martin Kroeker
2a5fe97e3b
temporarily(?) disable the alpha=0 branch as it does not handle INF,NAN
2024-06-27 16:21:57 +02:00
Martin Kroeker
7f8f037a36
handle INF and NAN in input
2024-06-22 16:03:30 +02:00
Martin Kroeker
f1248b849d
handle INF and NAN in input
2024-06-22 15:55:29 +02:00
Rajalakshmi Srinivasaraghavan
e112191b54
POWER: Fix issues in zscal to address lapack failures
...
This patch fixes following lapack failures with clang compiler on POWER.
zed.out: ZVX: 18 out of 5190 tests failed to pass the threshold
zgd.out: ZGV drivers: 25 out of 1092 tests failed to pass the threshold
zgd.out: ZGV drivers: 6 out of 1092 tests failed to pass the threshold
2024-05-22 08:00:06 -05:00
Martin Kroeker
aa259b141d
Merge pull request #4704 from amritahs-ibm/saxpy_perf_fix
...
Fix regression SAXPY when compiler with OpenXL compiler.
2024-05-18 19:11:25 +02:00
Chip Kerchner
3a1417671a
POWER: Fixing endianness issue in cswap/zswap kernel for AIX
2024-05-15 19:36:46 -05:00
Amrita H S
87b3d9054f
Fix regression SAXPY when compiler with OpenXL compiler.
...
SAXPY built with OpenXL regresses when compared to SAXPY
built with gcc. OpenXL compiler doesn't know that the
SAXPY inner kernel assembly is a 64 element loop and
to it the remainder loop is the main loop. It vectorizes
and interleaves the remainder to be a 48 elements per
iteration loop. With a max of 63 iterations, a 48 element
loop is mostly not going to get executed, so the 1 element
scalar loop that is the remainder after that is probably
mostly what gets executed.
This can be fixed by adding a pragma, loop interleave_count(2)
which will result in 8 element loop.
Signed-off-by: Amrita H S <amritahs@linux.vnet.ibm.com>
2024-05-12 23:27:55 -05:00
Chip-Kerchner
99384933ff
Revert "Merge pull request #4532 from austinpagan/cgemm_zgemm_c_code"
...
This reverts commit accea15551 , reversing
changes made to b925353006 .
2024-03-01 07:57:39 -06:00
Martin Kroeker
accea15551
Merge pull request #4532 from austinpagan/cgemm_zgemm_c_code
...
Cgemm zgemm c code
2024-02-27 22:07:07 +01:00
austinpagan
87ba528d8b
Changed C files to straighten out indentation. Removed commented lines from other file.
2024-02-01 18:46:07 -06:00
austinpagan
ddac75e0ef
Adding .C versions of CGEMM and ZGEMM
2024-02-01 12:24:25 -06:00
Chip Kerchner
2bb7ea64a1
Only vectorize 64-bit version for Power8.
2024-02-01 08:11:43 -06:00
Chip Kerchner
09bb48d1b9
Vectorize in-copy packing/copying for SGEMM - 4X faster.
2024-01-30 09:13:16 -06:00
Chip-Kerchner
058dd2a4cb
Replace two vector loads with one vector pair load and fix endianess of stores - DGEMM versions.
2024-01-08 14:16:09 -06:00
barracuda156
d9653af018
KERNEL.PPC970, KERNEL.PPCG4: unbreak CMake parsing
...
Fixes: https://github.com/OpenMathLib/OpenBLAS/issues/4366
2023-12-14 12:00:11 +08:00
Chip-Kerchner
4e738e561a
Replace two vector loads with one vector pair load and fix endianess of stores.
2023-12-08 12:36:08 -06:00
Rajalakshmi Srinivasaraghavan
980f702f72
POWER: AIX: Make use of power10 optimization
...
POWER10 optimizations are disabled when using default AIX assembler.
As we have fixed many issues recently, enabling optimization path
for default assembler.
2023-10-19 18:48:19 -05:00
Rajalakshmi Srinivasaraghavan
82fc29a57a
POWER10: Fallback to POWER8 functions
...
As cgemm and zgemm kernels are not optimized for big endian falling
back to POWER8 versions. Tested on AIX using gcc and Open XL C.
2023-10-11 17:04:42 -05:00
Martin Kroeker
8e6d93359d
Merge pull request #4196 from TiborGY/obsolete_inlines
...
Modernize obsolete inline order
2023-09-03 14:12:42 +02:00
Ian McInerney
79c15db348
Fix power10 gcc intrinsic check
...
__builtin_vsx_assemble_pair was only in GCC 10-11.2 and was replaced by
__builtin_vsx_build_pair thereafter.
2023-08-17 15:05:29 +01:00
TGY
b5ba95a6c0
Modernize obsolete inline order
2023-08-16 00:48:40 +02:00
Martin Kroeker
54d3246fc6
Allow negative INCX (API change from version 3.10 of the reference implementation)
2023-08-10 16:55:17 +02:00
Manjul Mohan
58b88aa5f0
POWER10: Fix compiler warnings
...
This patch removes the warning messages related to unused variables in
sbgemm_kernel_power10.c.
Signed-off-by: Manjul Mohan <manjul@linux.vnet.ibm.com>
2023-06-12 01:08:59 -04:00
Martin Kroeker
1688c7da43
change line endings from CRLF to LF
2022-11-16 22:24:01 +01:00
Martin Kroeker
6c118b7977
Fix DNRM2 returning INF instead of zero due to intermediate overflow
2022-07-24 17:42:31 +02:00
Martin Kroeker
c43ec53bdd
Merge pull request #3690 from RajalakshmiSR/cdotp10
...
POWER: Fix complex dot function failures
2022-07-19 13:59:16 +02:00
Rajalakshmi Srinivasaraghavan
a612e78a97
POWER: Fix complex dot function failures
...
There are some test failures in complex dot functions when compiling with gcc12.
The machine constraints used now do not update all the four elements in the
expected result array. Fixing this with a reduced level of optimization.
This is not changing any performance numbers but will be converted to C code in future.
2022-07-18 14:48:43 -05:00
Rajalakshmi Srinivasaraghavan
432fd99445
POWER10: dgemv builtin rename
...
Add check to use correct builtin name for older versions
of gcc10 compilers.
2022-07-18 09:48:01 -05:00