OpenBLAS

Author	SHA1	Message	Date
Qiyu8	ae0b1dea19	modify system.cmake to enable fma flag	2020-11-13 10:20:24 +08:00
Qiyu8	e0dac6b53b	fix the CI failure of target specific option mismatch	2020-11-12 20:31:03 +08:00
Qiyu8	e5c2ceb675	fix the CI failure of lack the head	2020-11-12 17:35:17 +08:00
Qiyu8	a87e537b8c	modify macro	2020-11-11 15:53:48 +08:00
Qiyu8	5bc0a7583f	only FMA3 and vector larger than 128 have positive effects.	2020-11-11 15:18:01 +08:00
Qiyu8	8c0b206d4c	Optimize the performance of rot by using universal intrinsics	2020-11-11 14:33:12 +08:00
Martin Kroeker	ff16329cb7	Merge pull request #2972 from xiegengxin/rot-intrinsic Improve the performance of rot by using AVX512 and AVX2 intrinsic	2020-11-08 22:43:00 +01:00
Martin Kroeker	110c7a6de0	Merge pull request #2979 from RajalakshmiSR/dot_power10 Optimize sdot/ddot for POWER10	2020-11-08 10:19:34 +01:00
Rajalakshmi Srinivasaraghavan	6e364981a8	Optimize sdot/ddot for POWER10 This patch makes use of new POWER10 vector pair instructions for loads and stores.	2020-11-07 15:21:58 -06:00
Martin Kroeker	b976a0bf40	Remove previous workaround for compiler flags related to cpu capabilities in x86_64 DYNAMIC_ARCH builds	2020-11-07 20:39:56 +01:00
Martin Kroeker	ff74319ea5	Merge pull request #2977 from martin-frbg/issue2976 Fix macro name used in ifdef for POWERPC/PGI	2020-11-07 14:41:34 +01:00
Martin Kroeker	28d2dfe2b3	Fix macro name used in ifdef	2020-11-07 12:17:49 +01:00
Gengxin Xie	725ffbf041	fix typo	2020-11-05 16:25:17 +08:00
Gengxin Xie	d9ba49165a	Improve the performance of rot by using AVX512 and AVX2 intrinsic	2020-11-05 15:12:36 +08:00
Rajalakshmi Srinivasaraghavan	dd7a9cc5bf	POWER10: Change dgemm unroll factors Changing the unroll factors for dgemm to 8 shows improved performance with POWER10 MMA feature. Also made some minor changes in sgemm for edge cases.	2020-10-31 18:28:57 -05:00
Rajalakshmi Srinivasaraghavan	b435491885	Optimize caxpy for POWER10 This patch makes use of new POWER10 vector pair instructions for loads and stores.	2020-10-29 14:57:51 -05:00
Chen, Guobing	a7b1f9b1bb	Implementation of BF16 based gemv 1. Add a new API -- sbgemv to support bfloat16 based gemv 2. Implement a generic kernel for sbgemv 3. Implement an avx512-bf16 based kernel for sbgemv Signed-off-by: Chen, Guobing <guobing.chen@intel.com>	2020-10-29 02:08:23 +08:00
Martin Kroeker	67f39ad813	Merge pull request #2939 from thrasibule/Makefile_cleanup reuse variables defined in Makefile.system	2020-10-28 09:38:40 +01:00
Rajalakshmi Srinivasaraghavan	c24ba8b1dd	Optimize saxpy for POWER10 This patch makes use of new POWER10 vector pair instructions for loads and stores.	2020-10-26 13:24:59 -05:00
Martin Kroeker	6f9460f0f6	Merge pull request #2937 from martin-frbg/pwr-buffersz Increase and unify BUFFERSIZE on POWER;fix gcc inline warning	2020-10-23 07:15:32 +02:00
Guillaume Horel	1917a4e7b8	reuse variables defined in Makefile.system	2020-10-22 22:04:25 -04:00
Martin Kroeker	34c3c407ef	label always_inline function as inline to silence a gcc warning	2020-10-22 22:14:26 +02:00
Martin Kroeker	2e48d560ba	Fix compiler version check	2020-10-22 16:23:29 +02:00
Rajalakshmi Srinivasaraghavan	ad745c0bae	Optimize scopy/ccopy for POWER10 This patch makes use of new POWER10 vector pair instructions for loads and stores. Also reorganized all variants of copy functions to make use of same kernel.	2020-10-21 09:53:45 -05:00
İsmail Dönmez	4a1d00f589	Fix build with -Werror=return-type dgemm_tcopy_16_skylakex.c CNAME function should return an int, add a return 0 similar to other files.	2020-10-21 08:43:39 +02:00
Bart Oldeman	b073d759d0	x86_64: clobber all xmm registers after vzeroupper As observed using GCC 10 using -march=native -ftree-vectorize on Knights Landing, it is now smart enough to find clobbers inside non-inlined static functions. In particular, sgemv counted on a kernel to preserve the whole %ymm2 register (since it was not in the clobber list), but the top part was destroyed by vzeroupper. This caused many tests to fail. This patch makes sure all xmm (and ymm/zmm by extension) registers are listed as clobbered to avoid this happening, as most kernels already did correctly in fact.	2020-10-20 02:16:47 +00:00
Martin Kroeker	dc6e44c3f8	Merge pull request #2916 from martin-frbg/issue2911 Clean up duplicate definitions in POWER8 kernels and fix power10 option passing	2020-10-19 23:33:31 +02:00
Martin Kroeker	a61c086408	Fix spurious trailing whitespace in comment	2020-10-19 09:12:12 +02:00
Bart Oldeman	03e781b766	sgemm_direct_skylakex: fix `75eeb26` regression. The `#if defined(SKYLAKEX) \|\| defined (COOPERLAKE)` from that commit was before #include "common.h" so caused the compiled function to be empty, returning garbage results for qualifying sgemm's on those architectures. Closes #2914	2020-10-18 19:58:07 +00:00
Martin Kroeker	f1a4071d8c	Clean up STACKSIZE redefinition	2020-10-18 19:41:43 +02:00
Martin Kroeker	97cf10062f	Clean up STACKSIZE redefinition	2020-10-18 19:39:18 +02:00
Martin Kroeker	17e288e18d	Clean up STACKSIZE redefinition	2020-10-18 19:37:04 +02:00
Martin Kroeker	c1422f3e46	Clean up STACKSIZE redefinition	2020-10-18 19:31:01 +02:00
Martin Kroeker	d85b24e103	Clean up STACKSIZE redefinition	2020-10-18 19:29:45 +02:00
Martin Kroeker	df70667043	fix core list for sse/sse2	2020-10-16 09:55:48 +02:00
Martin Kroeker	f071d1207a	add sse2	2020-10-15 22:10:32 +02:00
Martin Kroeker	dc6cefd2f5	Expressly enable -msse for 32bit DYNAMIC_ARCH kernels	2020-10-15 20:16:15 +02:00
Martin Kroeker	c339c40c01	Silence a redefinition warning	2020-10-15 19:08:12 +02:00
Martin Kroeker	10379fc83b	Use ifdef instead of if	2020-10-15 19:05:37 +02:00
Martin Kroeker	4c25910da0	Merge pull request #2896 from martin-frbg/intrin-double Add compiler flag for SSE4 where available	2020-10-15 11:12:35 +02:00
Martin Kroeker	ae6ac83991	Revert "add double precision SSE"	2020-10-15 08:37:02 +02:00
Qiyu8	4fac91ef37	adapt arm platform	2020-10-15 11:08:10 +08:00
Qiyu8	bfdf4b56da	Add double precision universal intrinsics for X86/ARM	2020-10-15 10:29:42 +08:00
Martin Kroeker	ebf0470fc2	add sse4.1 for DYNAMIC_ARCH kernels	2020-10-14 20:34:33 +02:00
Martin Kroeker	c9c3ae07af	Add double precision operations	2020-10-14 18:10:45 +02:00
Martin Kroeker	756802df61	Merge pull request #2890 from martin-frbg/s-d-sum Revert special handling of Windows xNRM2 and enable C+intrinsics kern…	2020-10-14 09:02:03 +02:00
Rajalakshmi Srinivasaraghavan	0826d68f93	POWER10: Change the packing format for bfloat16 As the new MMA instructions need the inputs in 4x2 order for bfloat16, changing the format in copy/packing code. This avoids permute instructions in the gemm kernel inner loop.	2020-10-13 16:05:10 -05:00
Rajalakshmi Srinivasaraghavan	b5d30b390d	Fix build issues with bfloat16 This patch fixes compilation errors due to recent renaming from SH to SB with BUILD_BFLOAT16.	2020-10-13 11:00:22 -05:00
Martin Kroeker	fecedc9c69	Add -mssse3	2020-10-13 11:55:41 +02:00
Martin Kroeker	0eacbca85f	Add Haswell and Zen to temporary sse3 whitelist	2020-10-13 11:42:39 +02:00

1 2 3 4 5 ...

1554 Commits