Martin Kroeker
|
2a62d2df96
|
Enable use of AVX512 microkernels with NVIDIA HPC from version 22.3
|
2023-07-26 19:39:11 +02:00 |
Wangyang Guo
|
225683218c
|
Small Matrix: use proper inline asm input constraint for AVX512 mask
|
2022-02-28 03:22:31 +00:00 |
Mosè Giordano
|
abbc947edb
|
Fix compilation of Skylake AVX512 kernels with GCC 6
|
2022-02-23 22:51:59 +00:00 |
Martin Kroeker
|
73ffabe6ba
|
Guard uses of _mm512_reduce_add_p?
|
2022-02-23 20:06:14 +01:00 |
Wangyang Guo
|
ca7682e3a3
|
Small Matrix: skylakex: sgemm nn: fix n6 conflicts with n4
|
2021-08-02 07:06:54 +00:00 |
Wangyang Guo
|
9967e61abb
|
Small Matrix: skylakex: sgemm nn: fix error when beta not zero
|
2021-08-02 07:06:54 +00:00 |
Wangyang Guo
|
a87736346f
|
Small Matrix: skylakex: sgemm nn: add n6 to improve performance
|
2021-08-02 07:06:54 +00:00 |
Wangyang Guo
|
4c9d9940fd
|
Small Matrix: skylakex: sgemm nn: reduce store 4 N at a time
|
2021-08-02 07:06:54 +00:00 |
Wangyang Guo
|
13b32f69b7
|
Small Matrix: skylakex: sgemm nn: reduce store 4 M at a time
|
2021-08-02 07:06:54 +00:00 |
Wangyang Guo
|
3d8c6d9607
|
Small Matrix: skylakex: sgemm nn: clean up unused code
|
2021-08-02 07:06:54 +00:00 |
Wangyang Guo
|
49b61a3f30
|
Small Matrix: skylakex: sgemm_nn: optimize for M <= 8
|
2021-08-02 07:06:54 +00:00 |
Wangyang Guo
|
f88470323b
|
Optimize M < 16 using AVX512 mask
|
2021-08-02 07:06:54 +00:00 |
Wangyang Guo
|
9186456a12
|
small matrix: SkylakeX: add SGEMM NN kernel
|
2021-08-02 07:06:54 +00:00 |