OpenBLAS

Commit Graph

Author	SHA1	Message	Date
Chip Kerchner	c8788208c8	Fixing block issue with transpose version.	2024-09-27 13:27:03 -05:00
Chip Kerchner	d7c0d87cd1	Small changes.	2024-09-26 15:21:29 -05:00
Chip Kerchner	eb6f3a05ef	Common MMA code.	2024-09-26 09:28:56 -05:00
Chip Kerchner	fb287d17fc	Common code.	2024-09-25 16:31:36 -05:00
Chip Kerchner	8ab6245771	Small change.	2024-09-24 16:50:21 -05:00
Chip Kerchner	df19375560	Almost final code for MMA.	2024-09-24 16:30:01 -05:00
Chip Kerchner	05aa63e738	More MMA BF16 GEMV code.	2024-09-24 12:54:02 -05:00
Chip Kerchner	c9ce37d527	Force vector pairs in clang.	2024-09-23 08:43:58 -05:00
Chip Kerchner	89a12fa083	MMA BF16 GEMV code.	2024-09-23 06:32:14 -05:00
Chip Kerchner	7947970f9d	Move common code.	2024-09-13 06:22:13 -05:00
Chip Kerchner	72216d28c2	Fix bug with inc_y adding results twice.	2024-09-11 08:47:32 -05:00
Chip Kerchner	2f142ee857	More common code.	2024-09-09 14:41:55 -05:00
Chip Kerchner	39fd29f1de	Minor improvement and turn off BF16 GEMV forwarding by default.	2024-09-08 18:28:31 -05:00
Chip Kerchner	8541b25e1d	Special case beta is one.	2024-09-06 14:48:48 -05:00
Chip Kerchner	76227e2948	Initial commit for vectorized BF16 GEMV. Added GEMM_GEMV_FORWARD_BF16 to enable using BF16 GEMV for one dimension matrices. Updated unit test to support inc_x != 1 or inc_y for GEMV.	2024-09-06 14:03:31 -05:00
Chip Kerchner	1a7b8c650d	Merge branch 'develop' into betterPowerGEMVTail	2024-08-01 14:59:12 -05:00
Martin Kroeker	f5d04318e3	Merge branch 'OpenMathLib:develop' into scalfixes	2024-07-21 13:43:43 +02:00
Martin Kroeker	73f8866ffb	make NAN handling depend on DUMMY2 parameter	2024-07-21 13:42:47 +02:00
Hong Bo Peng	db98f8753f	Try to fix LAPACK testing failures on P7. 1. Remove the FADD insn from the GEMV Transpose code. 2. Remove the FADD insn from GEMM and ZGEMM code. 3. Reorder the compution of the Imaginary part in ZGEMM code.	2024-07-19 02:08:19 -04:00
Martin Kroeker	b9bfc8ce09	make NAN handling depend on dummy2 parameter	2024-07-17 23:29:50 +02:00
Chip Kerchner	ba47c7f4f3	Vectorize reduction stage of sgemv_t.	2024-07-16 15:57:24 -05:00
Chip Kerchner	cb154832f8	Vectorize SBGEMM incopy - 4x faster.	2024-07-09 13:10:03 -05:00
Martin Kroeker	2a5fe97e3b	temporarily(?) disable the alpha=0 branch as it does not handle INF,NAN	2024-06-27 16:21:57 +02:00
Martin Kroeker	7f8f037a36	handle INF and NAN in input	2024-06-22 16:03:30 +02:00
Martin Kroeker	f1248b849d	handle INF and NAN in input	2024-06-22 15:55:29 +02:00
Rajalakshmi Srinivasaraghavan	e112191b54	POWER: Fix issues in zscal to address lapack failures This patch fixes following lapack failures with clang compiler on POWER. zed.out: ZVX: 18 out of 5190 tests failed to pass the threshold zgd.out: ZGV drivers: 25 out of 1092 tests failed to pass the threshold zgd.out: ZGV drivers: 6 out of 1092 tests failed to pass the threshold	2024-05-22 08:00:06 -05:00
Martin Kroeker	aa259b141d	Merge pull request #4704 from amritahs-ibm/saxpy_perf_fix Fix regression SAXPY when compiler with OpenXL compiler.	2024-05-18 19:11:25 +02:00
Chip Kerchner	3a1417671a	POWER: Fixing endianness issue in cswap/zswap kernel for AIX	2024-05-15 19:36:46 -05:00
Amrita H S	87b3d9054f	Fix regression SAXPY when compiler with OpenXL compiler. SAXPY built with OpenXL regresses when compared to SAXPY built with gcc. OpenXL compiler doesn't know that the SAXPY inner kernel assembly is a 64 element loop and to it the remainder loop is the main loop. It vectorizes and interleaves the remainder to be a 48 elements per iteration loop. With a max of 63 iterations, a 48 element loop is mostly not going to get executed, so the 1 element scalar loop that is the remainder after that is probably mostly what gets executed. This can be fixed by adding a pragma, loop interleave_count(2) which will result in 8 element loop. Signed-off-by: Amrita H S <amritahs@linux.vnet.ibm.com>	2024-05-12 23:27:55 -05:00
Chip-Kerchner	99384933ff	Revert "Merge pull request #4532 from austinpagan/cgemm_zgemm_c_code" This reverts commit `accea15551`, reversing changes made to `b925353006`.	2024-03-01 07:57:39 -06:00
Martin Kroeker	accea15551	Merge pull request #4532 from austinpagan/cgemm_zgemm_c_code Cgemm zgemm c code	2024-02-27 22:07:07 +01:00
austinpagan	87ba528d8b	Changed C files to straighten out indentation. Removed commented lines from other file.	2024-02-01 18:46:07 -06:00
austinpagan	ddac75e0ef	Adding .C versions of CGEMM and ZGEMM	2024-02-01 12:24:25 -06:00
Chip Kerchner	2bb7ea64a1	Only vectorize 64-bit version for Power8.	2024-02-01 08:11:43 -06:00
Chip Kerchner	09bb48d1b9	Vectorize in-copy packing/copying for SGEMM - 4X faster.	2024-01-30 09:13:16 -06:00
Chip-Kerchner	058dd2a4cb	Replace two vector loads with one vector pair load and fix endianess of stores - DGEMM versions.	2024-01-08 14:16:09 -06:00
barracuda156	d9653af018	KERNEL.PPC970, KERNEL.PPCG4: unbreak CMake parsing Fixes: https://github.com/OpenMathLib/OpenBLAS/issues/4366	2023-12-14 12:00:11 +08:00
Chip-Kerchner	4e738e561a	Replace two vector loads with one vector pair load and fix endianess of stores.	2023-12-08 12:36:08 -06:00
Rajalakshmi Srinivasaraghavan	980f702f72	POWER: AIX: Make use of power10 optimization POWER10 optimizations are disabled when using default AIX assembler. As we have fixed many issues recently, enabling optimization path for default assembler.	2023-10-19 18:48:19 -05:00
Rajalakshmi Srinivasaraghavan	82fc29a57a	POWER10: Fallback to POWER8 functions As cgemm and zgemm kernels are not optimized for big endian falling back to POWER8 versions. Tested on AIX using gcc and Open XL C.	2023-10-11 17:04:42 -05:00
Martin Kroeker	8e6d93359d	Merge pull request #4196 from TiborGY/obsolete_inlines Modernize obsolete inline order	2023-09-03 14:12:42 +02:00
Ian McInerney	79c15db348	Fix power10 gcc intrinsic check __builtin_vsx_assemble_pair was only in GCC 10-11.2 and was replaced by __builtin_vsx_build_pair thereafter.	2023-08-17 15:05:29 +01:00
TGY	b5ba95a6c0	Modernize obsolete inline order	2023-08-16 00:48:40 +02:00
Martin Kroeker	54d3246fc6	Allow negative INCX (API change from version 3.10 of the reference implementation)	2023-08-10 16:55:17 +02:00
Manjul Mohan	58b88aa5f0	POWER10: Fix compiler warnings This patch removes the warning messages related to unused variables in sbgemm_kernel_power10.c. Signed-off-by: Manjul Mohan <manjul@linux.vnet.ibm.com>	2023-06-12 01:08:59 -04:00
Martin Kroeker	1688c7da43	change line endings from CRLF to LF	2022-11-16 22:24:01 +01:00
Martin Kroeker	6c118b7977	Fix DNRM2 returning INF instead of zero due to intermediate overflow	2022-07-24 17:42:31 +02:00
Martin Kroeker	c43ec53bdd	Merge pull request #3690 from RajalakshmiSR/cdotp10 POWER: Fix complex dot function failures	2022-07-19 13:59:16 +02:00
Rajalakshmi Srinivasaraghavan	a612e78a97	POWER: Fix complex dot function failures There are some test failures in complex dot functions when compiling with gcc12. The machine constraints used now do not update all the four elements in the expected result array. Fixing this with a reduced level of optimization. This is not changing any performance numbers but will be converted to C code in future.	2022-07-18 14:48:43 -05:00
Rajalakshmi Srinivasaraghavan	432fd99445	POWER10: dgemv builtin rename Add check to use correct builtin name for older versions of gcc10 compilers.	2022-07-18 09:48:01 -05:00

1 2 3 4 5 ...

284 Commits