OpenBLAS

Commit Graph

Author	SHA1	Message	Date
Wangyang Guo	3dc6052c7e	initial support for Sapphire Rapids platform	2021-10-12 01:30:40 -07:00
Wangyang Guo	ee5ca8a328	x86_64: BFLOAT16: fix build warning	2021-09-28 18:30:06 +08:00
Martin Kroeker	8dfa61a61c	Initialize abs_mask1 with itself to silence a gcc warning	2021-09-15 22:11:35 +02:00
Martin Kroeker	99aa10b3ff	Initialize abs_mask1 with itself to silence a gcc warning actual initialization is via the _mm_cmpeq_ep18, which I've seen claimed to be the fastest way to set an xmm register to all 1s	2021-09-15 22:10:43 +02:00
Martin Kroeker	ce036a2fc0	Add casts	2021-09-14 21:41:53 +02:00
Martin Kroeker	af8843875a	Merge pull request #3376 from martin-frbg/issue3370 Fix a few harmless compiler warnings	2021-09-12 00:01:31 +02:00
Martin Kroeker	0925dfe2c9	One instance of kernel_4x1 is used even on SKX	2021-09-11 15:30:19 +02:00
Martin Kroeker	7d873a329f	Add ifdefs around conditionally used functions	2021-09-11 14:38:47 +02:00
Martin Kroeker	d17238599b	Add casts	2021-09-11 13:38:28 +02:00
Wangyang Guo	59a1114d03	sbgemm: cooperlake: tuning for small matrix	2021-09-07 21:30:46 +08:00
Wangyang Guo	682d66555d	sbgemm: cooperlake: implement ncopy_16	2021-09-07 21:30:46 +08:00
Wangyang Guo	beccb83b16	sbgemm: cooperlake: add n24 kernel for tcopy_4	2021-09-07 21:30:46 +08:00
Wangyang Guo	5fcacad32b	sbgemm: cooperlake: implement tcopy_4	2021-09-07 21:30:46 +08:00
Wangyang Guo	bb1c4fa5bd	sbgemm: cooperlake: prefetch A & B	2021-09-07 21:30:46 +08:00
Wangyang Guo	7a2d1601ec	sbgemm: cooperlake: unroll core loop by 2	2021-09-07 21:30:46 +08:00
Wangyang Guo	45fdf951b6	sbgemm: cooperlake: reorder ptr increase for performance	2021-09-07 21:30:46 +08:00
Wangyang Guo	cece3541ab	sbgemm: cooperlake: fix bug in m64n12	2021-09-07 21:30:46 +08:00
Wangyang Guo	9df0953cde	sbgemm: cooperlake: kernel works for NN	2021-09-07 21:30:45 +08:00
Wangyang Guo	2ec9f3a8aa	sbgemm: cooperlake: change kernel size to 16x4	2021-09-07 21:30:45 +08:00
Wangyang Guo	ef8f5fecc8	sbgemm: cooperlake: implement sbgemm_tcopy_32	2021-09-07 21:30:45 +08:00
Wangyang Guo	4c294336e6	sbgemm: cooperlake: add dummy source files	2021-09-07 21:30:45 +08:00
Wangyang Guo	619588fbab	sbgemm: remove unnecessary b0 files	2021-08-30 17:55:01 +08:00
Wangyang Guo	f39301935c	sbgemm: cooperlake: make sure hot buffer aligned to 64	2021-08-30 17:40:30 +08:00
Wangyang Guo	7d27b182fc	sbgemm: cooperlake: enable SBGEMM by small matrix path	2021-08-30 17:40:30 +08:00
Martin Kroeker	bec9d9f63d	Merge pull request #3335 from guowangy/small-matrix-latest Add GEMM optimization for small matrix and single/double kernel for skylakex	2021-08-29 22:33:33 +02:00
Wangyang Guo	dbbb39199f	sgemv: skylakex: fix build warning	2021-08-25 07:13:00 +00:00
Wangyang Guo	e9acb46431	sgemv: skylakex: bug fix for sgemv_t kernel in corner case	2021-08-25 07:07:27 +00:00
Wangyang Guo	f9dba63c28	Small Matrix: skylakex: remove unnecessary b0 source files	2021-08-13 03:28:44 +00:00
Wangyang Guo	44d0032f3b	Small Matrix: skylakex: fix build error in old compiler	2021-08-05 04:43:47 +00:00
Chen, Guobing	5d86becdae	Add all SBGEMM kernels for IA AVX512-BF16 based platforms Added all SBGEMM kernels including NN/NT/TN/TT for both ColMajor and RowMajor, based on AVX512-BF16 ISA set on IA. Signed-off-by: Chen, Guobing <guobing.chen@intel.com>	2021-08-05 11:11:29 +08:00
Wangyang Guo	fa777f5517	Small Matrix: skylakex: add DGEMM_SMALL_M_PERMIT and tune for TN kernel	2021-08-02 07:06:54 +00:00
Wangyang Guo	8592c21af4	Small Matrix: skylakex: dgemm nn: fix typo in idx load	2021-08-02 07:06:54 +00:00
Wangyang Guo	3e79f6d89a	Small Matrix: skylakex: add dgemm tn kernel	2021-08-02 07:06:54 +00:00
Wangyang Guo	323d7da4f7	Small Matrix: skylakex: add dgemm tt kernel	2021-08-02 07:06:54 +00:00
Wangyang Guo	f57fc932ac	Small Matrix: skylakex: add dgemm nt kernel	2021-08-02 07:06:54 +00:00
Wangyang Guo	91ec21202b	Small Matrix: skylakex: add dgemm nn kernel	2021-08-02 07:06:54 +00:00
Wangyang Guo	72e070539c	Small Matrix: skylakex: add sgemm tt kernel	2021-08-02 07:06:54 +00:00
Wangyang Guo	02c6e764f2	Small Matrix: skylakex: add SGEMM_SMALL_M_PERMIT and tune for TN kernel	2021-08-02 07:06:54 +00:00
Wangyang Guo	642c393879	Small Matrix: skylakex: add sgemm tn kernel	2021-08-02 07:06:54 +00:00
Wangyang Guo	ae3f5c737c	Small Matrix: skylakex: sgemm nt: optimize for M < 12	2021-08-02 07:06:54 +00:00
Wangyang Guo	0d72d75bf9	Small Matrix: skylakex: add sgemm nt kernel	2021-08-02 07:06:54 +00:00
Wangyang Guo	ca7682e3a3	Small Matrix: skylakex: sgemm nn: fix n6 conflicts with n4	2021-08-02 07:06:54 +00:00
Wangyang Guo	9967e61abb	Small Matrix: skylakex: sgemm nn: fix error when beta not zero	2021-08-02 07:06:54 +00:00
Wangyang Guo	a87736346f	Small Matrix: skylakex: sgemm nn: add n6 to improve performance	2021-08-02 07:06:54 +00:00
Wangyang Guo	4c9d9940fd	Small Matrix: skylakex: sgemm nn: reduce store 4 N at a time	2021-08-02 07:06:54 +00:00
Wangyang Guo	13b32f69b7	Small Matrix: skylakex: sgemm nn: reduce store 4 M at a time	2021-08-02 07:06:54 +00:00
Wangyang Guo	3d8c6d9607	Small Matrix: skylakex: sgemm nn: clean up unused code	2021-08-02 07:06:54 +00:00
Wangyang Guo	49b61a3f30	Small Matrix: skylakex: sgemm_nn: optimize for M <= 8	2021-08-02 07:06:54 +00:00
Wangyang Guo	f88470323b	Optimize M < 16 using AVX512 mask	2021-08-02 07:06:54 +00:00
Wangyang Guo	9186456a12	small matrix: SkylakeX: add SGEMM NN kernel	2021-08-02 07:06:54 +00:00
Martin Kroeker	5b4b385ecf	Temporarily disable the SkylakeX sgemv_t microkernel due to LAPACK testsuite failures	2021-07-14 20:50:14 +02:00
Ma, Yu	706a08d4a0	Optimized sgemv_t for small N based on AVX512	2021-06-08 15:08:28 -04:00
Martin Kroeker	5f677e782e	Merge pull request #3196 from guowangy/skylakex-gemm-batch-k GEMM: skylake: improve the performance when m is small	2021-05-22 19:25:28 +02:00
Martin Kroeker	02087a62e7	Merge pull request #3205 from intelmy/sgemv_n_opt optimize on sgemv_n for small n	2021-05-17 17:49:01 +02:00
Martin Kroeker	8b90e5f202	Drop redundant inclusion of complex.h	2021-05-14 15:06:44 +02:00
Martin Kroeker	c0ca63ea46	Fix missing conditionals for non-SKX kernels	2021-05-05 14:55:36 +02:00
pnp	3d4ccd2a13	fix for build error	2021-04-30 12:25:33 -04:00
pnp	c59652f0ce	optimize on sgemv_n for small n	2021-04-30 12:14:58 -04:00
Wangyang Guo	aa7b3dc3db	GEMM: skylake: improve the performance when m is small	2021-04-28 13:56:06 +00:00
Martin Kroeker	3d511f0e66	replace spurious avx512 requirement with fma check	2021-04-26 21:55:30 +02:00
Martin Kroeker	2dfb24730d	Use "old" compute(24) function with clang due to register limitations	2021-04-06 19:58:32 +02:00
Martin Kroeker	7b8f580941	Merge pull request #3156 from martin-frbg/omatcopy_d Move x86_64 DOMATCOPY_RT back to the C implementation	2021-03-19 15:22:48 +01:00
Martin Kroeker	0f5e86a0d9	Remove premature entry for DOMATCOPY_RT	2021-03-18 21:53:50 +01:00
Martin Kroeker	7b294a99fd	Move common.h back to the top of the file so that SKYLAKEX (from config.h) is defined in time	2021-03-18 21:28:19 +01:00
Martin Kroeker	0934568d9c	Move includes under the ifdef for compilers w/o intrinsics support	2021-03-12 12:42:05 +01:00
Martin Kroeker	a9f6f7ad39	Remove spurious AVX512 requirement and add AVX2/FMA3 guard	2021-03-06 14:35:49 +01:00
Martin Kroeker	292d1af1a0	Update omatcopy_rt.c	2021-02-24 09:34:14 +01:00
Martin Kroeker	325b398e3c	Update omatcopy_rt.c	2021-02-24 09:13:12 +01:00
Martin Kroeker	6f5667b4d4	Enable optimized S/D OMATCOPY_RT	2021-02-24 09:03:41 +01:00
Martin Kroeker	cceeee7806	Add optimized omatcopy_rt	2021-02-24 09:00:54 +01:00
Martin Kroeker	47691c031f	Use Haswell optimizations for Zen as well	2021-02-11 09:26:15 +01:00
Martin Kroeker	ce7ddd8921	Use Haswell optimizations for Zen as well	2021-02-11 09:25:36 +01:00
Martin Kroeker	950c047b49	Use Haswell optimizations for Zen as well	2021-02-11 09:24:51 +01:00
Martin Kroeker	46509953a9	Use Haswell optimizations for Zen as well	2021-02-11 09:24:16 +01:00
Martin Kroeker	db348dcff2	Enable optimized srot/drot kernels from Haswell	2021-02-11 09:23:05 +01:00
Martin Kroeker	69a5558203	Merge pull request #3059 from Guobing-Chen/BF16_gemm Initial code for Cooperlake BF16 GEMM kernel	2021-01-23 19:08:05 +01:00
Alex Henrie	202fc9e8ed	Fix uninitialized argument value in dasum_k	2021-01-14 19:40:31 -07:00
Chen, Guobing	b0beb0b1ca	Initial code for Cooperlake BF16 GEMM kernel	2021-01-11 02:15:21 +08:00
Martin Kroeker	114eb159a4	Disable FMA intrinsics in the srot kernel when the compiler is PGI/NVIDIA	2020-12-19 22:15:58 +01:00
Martin Kroeker	441c08c9ff	Merge pull request #3016 from xiegengxin/complex-asum Improve the performance of zasum and casum with AVX512 intrinsic	2020-12-04 22:07:16 +01:00
Gengxin Xie	0cb7a403b2	fix error declare function blas_level1_thread_with_return_value	2020-12-02 09:51:52 +08:00
Gengxin Xie	b766c1e9bb	Improve the performance of zasum and casum with AVX512 intrinsic	2020-12-01 16:49:26 +08:00
Martin Kroeker	f1bf040b25	Merge pull request #2988 from xiegengxin/smp-asum Improve the performance of dasum and sasum when SMP is defined	2020-11-22 12:24:13 +01:00
Gengxin Xie	d6e7e05bb3	Improve the performance of dasum and sasum when SMP is defined	2020-11-13 14:20:52 +08:00
Qiyu8	a87e537b8c	modify macro	2020-11-11 15:53:48 +08:00
Qiyu8	5bc0a7583f	only FMA3 and vector larger than 128 have positive effects.	2020-11-11 15:18:01 +08:00
Qiyu8	8c0b206d4c	Optimize the performance of rot by using universal intrinsics	2020-11-11 14:33:12 +08:00
Martin Kroeker	ff16329cb7	Merge pull request #2972 from xiegengxin/rot-intrinsic Improve the performance of rot by using AVX512 and AVX2 intrinsic	2020-11-08 22:43:00 +01:00
Gengxin Xie	725ffbf041	fix typo	2020-11-05 16:25:17 +08:00
Gengxin Xie	d9ba49165a	Improve the performance of rot by using AVX512 and AVX2 intrinsic	2020-11-05 15:12:36 +08:00
Chen, Guobing	a7b1f9b1bb	Implementation of BF16 based gemv 1. Add a new API -- sbgemv to support bfloat16 based gemv 2. Implement a generic kernel for sbgemv 3. Implement an avx512-bf16 based kernel for sbgemv Signed-off-by: Chen, Guobing <guobing.chen@intel.com>	2020-10-29 02:08:23 +08:00
İsmail Dönmez	4a1d00f589	Fix build with -Werror=return-type dgemm_tcopy_16_skylakex.c CNAME function should return an int, add a return 0 similar to other files.	2020-10-21 08:43:39 +02:00
Bart Oldeman	b073d759d0	x86_64: clobber all xmm registers after vzeroupper As observed using GCC 10 using -march=native -ftree-vectorize on Knights Landing, it is now smart enough to find clobbers inside non-inlined static functions. In particular, sgemv counted on a kernel to preserve the whole %ymm2 register (since it was not in the clobber list), but the top part was destroyed by vzeroupper. This caused many tests to fail. This patch makes sure all xmm (and ymm/zmm by extension) registers are listed as clobbered to avoid this happening, as most kernels already did correctly in fact.	2020-10-20 02:16:47 +00:00
Bart Oldeman	03e781b766	sgemm_direct_skylakex: fix `75eeb26` regression. The `#if defined(SKYLAKEX) \|\| defined (COOPERLAKE)` from that commit was before #include "common.h" so caused the compiled function to be empty, returning garbage results for qualifying sgemm's on those architectures. Closes #2914	2020-10-18 19:58:07 +00:00
Martin Kroeker	c339c40c01	Silence a redefinition warning	2020-10-15 19:08:12 +02:00
Qiyu8	bfdf4b56da	Add double precision universal intrinsics for X86/ARM	2020-10-15 10:29:42 +08:00
Martin Kroeker	756802df61	Merge pull request #2890 from martin-frbg/s-d-sum Revert special handling of Windows xNRM2 and enable C+intrinsics kern…	2020-10-14 09:02:03 +02:00
Martin Kroeker	8d2df7d066	Revert special handling of Windows xNRM2 and enable C+intrinsics kernel for SSUM/DSUM	2020-10-13 00:14:29 +02:00
Martin Kroeker	08929430cd	Merge pull request #2886 from martin-frbg/issue_2767 Rename "HALF" precision functions (sh prefix) to "BFLOAT16" with "sb" prefix	2020-10-13 00:04:35 +02:00
Martin Kroeker	0c84ffe05f	Merge pull request #2881 from mattip/fninit add fninit to reset fpu registers before assembler routines	2020-10-12 23:50:41 +02:00

1 2 3 4 5 ...

757 Commits