Commit Graph

33 Commits

Author SHA1 Message Date
Wangyang Guo
f9dba63c28 Small Matrix: skylakex: remove unnecessary b0 source files 2021-08-13 03:28:44 +00:00
Wangyang Guo
fa777f5517 Small Matrix: skylakex: add DGEMM_SMALL_M_PERMIT and tune for TN kernel 2021-08-02 07:06:54 +00:00
Wangyang Guo
3e79f6d89a Small Matrix: skylakex: add dgemm tn kernel 2021-08-02 07:06:54 +00:00
Wangyang Guo
323d7da4f7 Small Matrix: skylakex: add dgemm tt kernel 2021-08-02 07:06:54 +00:00
Wangyang Guo
f57fc932ac Small Matrix: skylakex: add dgemm nt kernel 2021-08-02 07:06:54 +00:00
Wangyang Guo
91ec21202b Small Matrix: skylakex: add dgemm nn kernel 2021-08-02 07:06:54 +00:00
Wangyang Guo
72e070539c Small Matrix: skylakex: add sgemm tt kernel 2021-08-02 07:06:54 +00:00
Wangyang Guo
02c6e764f2 Small Matrix: skylakex: add SGEMM_SMALL_M_PERMIT and tune for TN kernel 2021-08-02 07:06:54 +00:00
Wangyang Guo
642c393879 Small Matrix: skylakex: add sgemm tn kernel 2021-08-02 07:06:54 +00:00
Wangyang Guo
0d72d75bf9 Small Matrix: skylakex: add sgemm nt kernel 2021-08-02 07:06:54 +00:00
Wangyang Guo
9186456a12 small matrix: SkylakeX: add SGEMM NN kernel 2021-08-02 07:06:54 +00:00
Gengxin Xie
b766c1e9bb Improve the performance of zasum and casum with AVX512 intrinsic 2020-12-01 16:49:26 +08:00
wjc404
086d87a302 AVX512 dgemm tcopy_16 function 2020-06-20 00:07:43 +08:00
wjc404
b8307768e2 Add files via upload 2020-03-21 05:42:10 +08:00
wjc404
62b9608986 Update KERNEL.SKYLAKEX 2020-03-17 12:52:55 +08:00
wjc404
f566787e6e Update KERNEL.SKYLAKEX 2020-02-16 22:58:44 +08:00
wjc404
081b188529 Update KERNEL.SKYLAKEX 2020-02-03 21:38:08 +08:00
wjc404
3a100b2797 Update KERNEL.SKYLAKEX 2020-01-09 13:48:41 +08:00
wjc404
819e852ae7 AVX512 CGEMM & ZGEMM kernels
96-99% 1-thread performance of MKL2018
2019-11-11 20:04:52 +08:00
wjc404
1df9a2013d new sgemm kernel for skylakex 2019-11-02 00:00:48 +08:00
wjc404
6ff013bae0 native support for icopy_4
90% MKL 1-thread performance.
2019-10-19 03:54:44 +08:00
wjc404
844629af57 Add files via upload 2019-10-16 02:00:34 +08:00
Martin Kroeker
b1561ecc68 Disable DGEMMINCOPY as well for now
#1955
2019-05-05 15:52:01 +02:00
Martin Kroeker
7ed8431527 Disable the SkyLakeX DGEMMITCOPY kernel as well
as a stopgap measure for https://github.com/numpy/numpy/issues/13401 as mentioned in #1955
2019-05-04 22:54:41 +02:00
Martin Kroeker
e608d4f7fe Disable the AVX512 DGEMM kernel (again)
Due to as yet unresolved errors seen in #1955 and #2029
2019-03-13 22:10:28 +01:00
Arjan van de Ven
55b244ca0d enable the SGEMM/SKX C based kernel
In QA the final bug was found so now the sklyakex sgemm C based kernel can
be activated....
2018-10-12 09:30:35 +00:00
Arjan van de Ven
32bec8afbb add a skylakex optimized dgemm beta function 2018-10-06 16:36:26 +00:00
Arjan van de Ven
d74dc39b0f Add optimized *copy versions for skylakex
Add optimized n/t copy versions for skylakex; in the patch the
tcopy is also rewritten using intrinsics; the ncopy file
will be worked on in a future commit
2018-10-06 13:51:44 +00:00
Arjan van de Ven
45fe8cb0c5 Create a AVX512 enabled version of DGEMM
This patch adds dgemm_kernel_4x8_skylakex.c which is
* dgemm_kernel_4x8_haswell.s converted to C + intrinsics
* 8x8 support added
* 8x8 kernel implemented using AVX512

Performance is a work in progress, but already shows a 10% - 20%
increase for a wide range of matrix sizes.
2018-10-03 14:45:25 +00:00
Martin Kroeker
6e54b0a027 Disable the 16x2 DTRMM kernel on SkylakeX as well 2018-06-30 17:31:06 +02:00
Martin Kroeker
f0a8dc2eec Disable the AVX512 DGEMM kernel for now
due to #1643
2018-06-30 11:34:48 +02:00
Arjan van de Ven
89372e0993 Use AVX512 also for DGEMM
this required switching to the generic gemm_beta code (which is faster anyway on SKX)
for both DGEMM and SGEMM

Performance for the not-retuned version is in the 30% range
2018-06-03 22:17:27 +00:00
Arjan van de Ven
99c7bba8e4 Initial support for SkylakeX / AVX512
This patch adds the basic infrastructure for adding the SkylakeX (Intel Skylake server)
target. The SkylakeX target will use the AVX512 (AVX512VL level) instruction set,
which brings 2 basic things:
1) 512 bit wide SIMD (2x width of AVX2)
2) 32 SIMD registers (2x the number on AVX2)

This initial patch only contains a trivial transofrmation of the Haswell SGEMM kernel
to AVX512VL; more will follow later but this patch aims to get the infrastructure
in place for this "later".

Full performance tuning has not been done yet; with more registers and wider SIMD
it's in theory possible to retune the kernels but even without that there's an
interesting enough performance increase (30-40% range) with just this change.
2018-06-03 07:58:52 +00:00