OpenBLAS

Author	SHA1	Message	Date
Wangyang Guo	f9dba63c28	Small Matrix: skylakex: remove unnecessary b0 source files	2021-08-13 03:28:44 +00:00
Wangyang Guo	fa777f5517	Small Matrix: skylakex: add DGEMM_SMALL_M_PERMIT and tune for TN kernel	2021-08-02 07:06:54 +00:00
Wangyang Guo	3e79f6d89a	Small Matrix: skylakex: add dgemm tn kernel	2021-08-02 07:06:54 +00:00
Wangyang Guo	323d7da4f7	Small Matrix: skylakex: add dgemm tt kernel	2021-08-02 07:06:54 +00:00
Wangyang Guo	f57fc932ac	Small Matrix: skylakex: add dgemm nt kernel	2021-08-02 07:06:54 +00:00
Wangyang Guo	91ec21202b	Small Matrix: skylakex: add dgemm nn kernel	2021-08-02 07:06:54 +00:00
Wangyang Guo	72e070539c	Small Matrix: skylakex: add sgemm tt kernel	2021-08-02 07:06:54 +00:00
Wangyang Guo	02c6e764f2	Small Matrix: skylakex: add SGEMM_SMALL_M_PERMIT and tune for TN kernel	2021-08-02 07:06:54 +00:00
Wangyang Guo	642c393879	Small Matrix: skylakex: add sgemm tn kernel	2021-08-02 07:06:54 +00:00
Wangyang Guo	0d72d75bf9	Small Matrix: skylakex: add sgemm nt kernel	2021-08-02 07:06:54 +00:00
Wangyang Guo	9186456a12	small matrix: SkylakeX: add SGEMM NN kernel	2021-08-02 07:06:54 +00:00
Gengxin Xie	b766c1e9bb	Improve the performance of zasum and casum with AVX512 intrinsic	2020-12-01 16:49:26 +08:00
wjc404	086d87a302	AVX512 dgemm tcopy_16 function	2020-06-20 00:07:43 +08:00
wjc404	b8307768e2	Add files via upload	2020-03-21 05:42:10 +08:00
wjc404	62b9608986	Update KERNEL.SKYLAKEX	2020-03-17 12:52:55 +08:00
wjc404	f566787e6e	Update KERNEL.SKYLAKEX	2020-02-16 22:58:44 +08:00
wjc404	081b188529	Update KERNEL.SKYLAKEX	2020-02-03 21:38:08 +08:00
wjc404	3a100b2797	Update KERNEL.SKYLAKEX	2020-01-09 13:48:41 +08:00
wjc404	819e852ae7	AVX512 CGEMM & ZGEMM kernels 96-99% 1-thread performance of MKL2018	2019-11-11 20:04:52 +08:00
wjc404	1df9a2013d	new sgemm kernel for skylakex	2019-11-02 00:00:48 +08:00
wjc404	6ff013bae0	native support for icopy_4 90% MKL 1-thread performance.	2019-10-19 03:54:44 +08:00
wjc404	844629af57	Add files via upload	2019-10-16 02:00:34 +08:00
Martin Kroeker	b1561ecc68	Disable DGEMMINCOPY as well for now #1955	2019-05-05 15:52:01 +02:00
Martin Kroeker	7ed8431527	Disable the SkyLakeX DGEMMITCOPY kernel as well as a stopgap measure for https://github.com/numpy/numpy/issues/13401 as mentioned in #1955	2019-05-04 22:54:41 +02:00
Martin Kroeker	e608d4f7fe	Disable the AVX512 DGEMM kernel (again) Due to as yet unresolved errors seen in #1955 and #2029	2019-03-13 22:10:28 +01:00
Arjan van de Ven	55b244ca0d	enable the SGEMM/SKX C based kernel In QA the final bug was found so now the sklyakex sgemm C based kernel can be activated....	2018-10-12 09:30:35 +00:00
Arjan van de Ven	32bec8afbb	add a skylakex optimized dgemm beta function	2018-10-06 16:36:26 +00:00
Arjan van de Ven	d74dc39b0f	Add optimized *copy versions for skylakex Add optimized n/t copy versions for skylakex; in the patch the tcopy is also rewritten using intrinsics; the ncopy file will be worked on in a future commit	2018-10-06 13:51:44 +00:00
Arjan van de Ven	45fe8cb0c5	Create a AVX512 enabled version of DGEMM This patch adds dgemm_kernel_4x8_skylakex.c which is * dgemm_kernel_4x8_haswell.s converted to C + intrinsics * 8x8 support added * 8x8 kernel implemented using AVX512 Performance is a work in progress, but already shows a 10% - 20% increase for a wide range of matrix sizes.	2018-10-03 14:45:25 +00:00
Martin Kroeker	6e54b0a027	Disable the 16x2 DTRMM kernel on SkylakeX as well	2018-06-30 17:31:06 +02:00
Martin Kroeker	f0a8dc2eec	Disable the AVX512 DGEMM kernel for now due to #1643	2018-06-30 11:34:48 +02:00
Arjan van de Ven	89372e0993	Use AVX512 also for DGEMM this required switching to the generic gemm_beta code (which is faster anyway on SKX) for both DGEMM and SGEMM Performance for the not-retuned version is in the 30% range	2018-06-03 22:17:27 +00:00
Arjan van de Ven	99c7bba8e4	Initial support for SkylakeX / AVX512 This patch adds the basic infrastructure for adding the SkylakeX (Intel Skylake server) target. The SkylakeX target will use the AVX512 (AVX512VL level) instruction set, which brings 2 basic things: 1) 512 bit wide SIMD (2x width of AVX2) 2) 32 SIMD registers (2x the number on AVX2) This initial patch only contains a trivial transofrmation of the Haswell SGEMM kernel to AVX512VL; more will follow later but this patch aims to get the infrastructure in place for this "later". Full performance tuning has not been done yet; with more registers and wider SIMD it's in theory possible to retune the kernels but even without that there's an interesting enough performance increase (30-40% range) with just this change.	2018-06-03 07:58:52 +00:00

33 Commits