OpenBLAS

Commit Graph

Author	SHA1	Message	Date
Chip Kerchner	c8f53b85ce	Merge remote-tracking branch 'origin/develop' into vectorizeBF16GEMV	2024-10-11 11:10:20 -05:00
Martin Kroeker	e52d9b4cf1	Merge pull request #4928 from austinpagan/czgemm_in_c CGEMM & ZGEMM using C code, Power only, P10 only.	2024-10-09 20:26:21 +02:00
Gordon Fossum	0b7fb5c791	CGEMM & ZGEMM using C code.	2024-10-09 09:42:23 -05:00
Chip Kerchner	d6bb8dcfd1	Common code.	2024-10-06 14:13:43 -05:00
Martin Kroeker	c9e92348a6	Handle inf/nan if dummy2 flag is set	2024-10-06 19:57:17 +02:00
Chip Kerchner	9ac0fb0111	Merge branch 'develop' into vectorizeBF16GEMV	2024-10-04 06:49:53 -05:00
Martin Kroeker	d714013ab9	change sgemm kernel to 4x4 as the 16x4 altivec goes out of bounds	2024-10-03 22:04:20 +02:00
Chip Kerchner	915a6d6e44	Add casting.	2024-10-03 14:08:21 -05:00
Chip Kerchner	7ec3c16d82	Remove beta from optimized functions.	2024-10-03 13:27:33 -05:00
Chip Kerchner	7cc00f68c9	Remove more duplicate.	2024-10-01 11:23:32 -05:00
Chip Kerchner	e238a68c03	Remove duplicate.	2024-10-01 11:06:23 -05:00
Chip Kerchner	32095b0cbb	Remove parameter.	2024-10-01 09:32:42 -05:00
Chip Kerchner	c8788208c8	Fixing block issue with transpose version.	2024-09-27 13:27:03 -05:00
Chip Kerchner	d7c0d87cd1	Small changes.	2024-09-26 15:21:29 -05:00
Chip Kerchner	eb6f3a05ef	Common MMA code.	2024-09-26 09:28:56 -05:00
Chip Kerchner	fb287d17fc	Common code.	2024-09-25 16:31:36 -05:00
Chip Kerchner	8ab6245771	Small change.	2024-09-24 16:50:21 -05:00
Chip Kerchner	df19375560	Almost final code for MMA.	2024-09-24 16:30:01 -05:00
Chip Kerchner	05aa63e738	More MMA BF16 GEMV code.	2024-09-24 12:54:02 -05:00
Chip Kerchner	c9ce37d527	Force vector pairs in clang.	2024-09-23 08:43:58 -05:00
Chip Kerchner	89a12fa083	MMA BF16 GEMV code.	2024-09-23 06:32:14 -05:00
Chip Kerchner	7947970f9d	Move common code.	2024-09-13 06:22:13 -05:00
Chip Kerchner	72216d28c2	Fix bug with inc_y adding results twice.	2024-09-11 08:47:32 -05:00
Chip Kerchner	2f142ee857	More common code.	2024-09-09 14:41:55 -05:00
Chip Kerchner	39fd29f1de	Minor improvement and turn off BF16 GEMV forwarding by default.	2024-09-08 18:28:31 -05:00
Chip Kerchner	8541b25e1d	Special case beta is one.	2024-09-06 14:48:48 -05:00
Chip Kerchner	76227e2948	Initial commit for vectorized BF16 GEMV. Added GEMM_GEMV_FORWARD_BF16 to enable using BF16 GEMV for one dimension matrices. Updated unit test to support inc_x != 1 or inc_y for GEMV.	2024-09-06 14:03:31 -05:00
Chip Kerchner	1a7b8c650d	Merge branch 'develop' into betterPowerGEMVTail	2024-08-01 14:59:12 -05:00
Martin Kroeker	f5d04318e3	Merge branch 'OpenMathLib:develop' into scalfixes	2024-07-21 13:43:43 +02:00
Martin Kroeker	73f8866ffb	make NAN handling depend on DUMMY2 parameter	2024-07-21 13:42:47 +02:00
Hong Bo Peng	db98f8753f	Try to fix LAPACK testing failures on P7. 1. Remove the FADD insn from the GEMV Transpose code. 2. Remove the FADD insn from GEMM and ZGEMM code. 3. Reorder the compution of the Imaginary part in ZGEMM code.	2024-07-19 02:08:19 -04:00
Martin Kroeker	b9bfc8ce09	make NAN handling depend on dummy2 parameter	2024-07-17 23:29:50 +02:00
Chip Kerchner	ba47c7f4f3	Vectorize reduction stage of sgemv_t.	2024-07-16 15:57:24 -05:00
Chip Kerchner	cb154832f8	Vectorize SBGEMM incopy - 4x faster.	2024-07-09 13:10:03 -05:00
Martin Kroeker	2a5fe97e3b	temporarily(?) disable the alpha=0 branch as it does not handle INF,NAN	2024-06-27 16:21:57 +02:00
Martin Kroeker	7f8f037a36	handle INF and NAN in input	2024-06-22 16:03:30 +02:00
Martin Kroeker	f1248b849d	handle INF and NAN in input	2024-06-22 15:55:29 +02:00
Rajalakshmi Srinivasaraghavan	e112191b54	POWER: Fix issues in zscal to address lapack failures This patch fixes following lapack failures with clang compiler on POWER. zed.out: ZVX: 18 out of 5190 tests failed to pass the threshold zgd.out: ZGV drivers: 25 out of 1092 tests failed to pass the threshold zgd.out: ZGV drivers: 6 out of 1092 tests failed to pass the threshold	2024-05-22 08:00:06 -05:00
Martin Kroeker	aa259b141d	Merge pull request #4704 from amritahs-ibm/saxpy_perf_fix Fix regression SAXPY when compiler with OpenXL compiler.	2024-05-18 19:11:25 +02:00
Chip Kerchner	3a1417671a	POWER: Fixing endianness issue in cswap/zswap kernel for AIX	2024-05-15 19:36:46 -05:00
Amrita H S	87b3d9054f	Fix regression SAXPY when compiler with OpenXL compiler. SAXPY built with OpenXL regresses when compared to SAXPY built with gcc. OpenXL compiler doesn't know that the SAXPY inner kernel assembly is a 64 element loop and to it the remainder loop is the main loop. It vectorizes and interleaves the remainder to be a 48 elements per iteration loop. With a max of 63 iterations, a 48 element loop is mostly not going to get executed, so the 1 element scalar loop that is the remainder after that is probably mostly what gets executed. This can be fixed by adding a pragma, loop interleave_count(2) which will result in 8 element loop. Signed-off-by: Amrita H S <amritahs@linux.vnet.ibm.com>	2024-05-12 23:27:55 -05:00
Chip-Kerchner	99384933ff	Revert "Merge pull request #4532 from austinpagan/cgemm_zgemm_c_code" This reverts commit `accea15551`, reversing changes made to `b925353006`.	2024-03-01 07:57:39 -06:00
Martin Kroeker	accea15551	Merge pull request #4532 from austinpagan/cgemm_zgemm_c_code Cgemm zgemm c code	2024-02-27 22:07:07 +01:00
austinpagan	87ba528d8b	Changed C files to straighten out indentation. Removed commented lines from other file.	2024-02-01 18:46:07 -06:00
austinpagan	ddac75e0ef	Adding .C versions of CGEMM and ZGEMM	2024-02-01 12:24:25 -06:00
Chip Kerchner	2bb7ea64a1	Only vectorize 64-bit version for Power8.	2024-02-01 08:11:43 -06:00
Chip Kerchner	09bb48d1b9	Vectorize in-copy packing/copying for SGEMM - 4X faster.	2024-01-30 09:13:16 -06:00
Chip-Kerchner	058dd2a4cb	Replace two vector loads with one vector pair load and fix endianess of stores - DGEMM versions.	2024-01-08 14:16:09 -06:00
barracuda156	d9653af018	KERNEL.PPC970, KERNEL.PPCG4: unbreak CMake parsing Fixes: https://github.com/OpenMathLib/OpenBLAS/issues/4366	2023-12-14 12:00:11 +08:00
Chip-Kerchner	4e738e561a	Replace two vector loads with one vector pair load and fix endianess of stores.	2023-12-08 12:36:08 -06:00

1 2 3 4 5 ...

296 Commits