Caroline Newcombe
|
5cc1111383
|
fix unsafe read of Y in assembly kernel
|
2022-03-11 11:56:33 -06:00 |
Wangyang Guo
|
225683218c
|
Small Matrix: use proper inline asm input constraint for AVX512 mask
|
2022-02-28 03:22:31 +00:00 |
Martin Kroeker
|
9c626e466e
|
really fix definition of SHUFFLE_MAGIC_NO
|
2022-02-25 15:36:02 +01:00 |
Martin Kroeker
|
9d7429406f
|
Declare SHUFFLE_MAGIC_NO as const to placate clang
|
2022-02-25 10:05:36 +01:00 |
Martin Kroeker
|
522f809825
|
Merge pull request #3542 from martin-frbg/issue3540
Fix compilation for CooperLake on Windows/clang
|
2022-02-24 00:00:00 +01:00 |
Mosè Giordano
|
abbc947edb
|
Fix compilation of Skylake AVX512 kernels with GCC 6
|
2022-02-23 22:51:59 +00:00 |
Martin Kroeker
|
c62f8e2c01
|
Prevent compiler attempts to use k0 as mask register
|
2022-02-23 20:12:20 +01:00 |
Martin Kroeker
|
80eb581c83
|
Fix non-portable u_int64_t
|
2022-02-23 20:10:59 +01:00 |
Martin Kroeker
|
73ffabe6ba
|
Guard uses of _mm512_reduce_add_p?
|
2022-02-23 20:06:14 +01:00 |
Martin Kroeker
|
7b146e590c
|
fix function typecast
|
2021-12-24 20:01:52 +01:00 |
Martin Kroeker
|
e9a0e52201
|
fix function typecast
|
2021-12-24 20:00:50 +01:00 |
Martin Kroeker
|
d1ee6ff73f
|
fix function typecasts
|
2021-12-21 18:45:28 +01:00 |
Martin Kroeker
|
5378046abd
|
roll back DGEMM kernels to 4x8 when compiling for DYNAMIC_ARCH
|
2021-12-06 19:43:54 +01:00 |
Caroline Newcombe
|
feeb8283a5
|
Fix unsafe read during final iteration of zsymv_L_sse2.S
|
2021-11-19 14:29:32 -06:00 |
Wangyang Guo
|
63a103ba6e
|
sbgemm: spr: disable small matrix path by default
|
2021-10-17 19:08:03 -07:00 |
Wangyang Guo
|
82194ea9d2
|
sbgemm: spr: implement otcopy_16
|
2021-10-17 19:08:03 -07:00 |
Wangyang Guo
|
8632380a96
|
sbgemm: spr: reuse ncopy_16 from cooperlake as incopy
|
2021-10-17 19:08:03 -07:00 |
Wangyang Guo
|
6bc8204ce5
|
sbgemm: spr: optimization for tmp_c buffer
|
2021-10-17 19:08:03 -07:00 |
Wangyang Guo
|
f018aa342a
|
sbgemm: spr: kernel handle alpha != 1.0
|
2021-10-17 19:08:03 -07:00 |
Wangyang Guo
|
a52456b168
|
sbgemm: spr: oncopy: use tile load/store instead
|
2021-10-17 19:08:03 -07:00 |
Wangyang Guo
|
f2485352a6
|
sbgemm: spr: only load A once in tail_k handling
|
2021-10-17 19:08:03 -07:00 |
Wangyang Guo
|
9ab33228bb
|
sbgemm: spr: process k2 and odd k at the same time
|
2021-10-17 19:08:03 -07:00 |
Wangyang Guo
|
10d52646e2
|
sbgemm: spr: oncopy: avoid handling too much pointer at a time
|
2021-10-17 19:08:03 -07:00 |
Wangyang Guo
|
88154ed02d
|
sbgemm: spr: reduce tile conf loading by seperate tail k handling
|
2021-10-17 19:08:03 -07:00 |
Wangyang Guo
|
a70bfb52d5
|
sbgemm: spr: kernel works for NN case when alpha is 1.0
|
2021-10-17 19:08:03 -07:00 |
Wangyang Guo
|
6051c86741
|
sbgemm: spr: kernel works for m32 in NN case
|
2021-10-17 19:08:03 -07:00 |
Wangyang Guo
|
d0b253ac6e
|
sbgemm: spr: implement oncopy_16
|
2021-10-17 19:08:03 -07:00 |
Wangyang Guo
|
1d48b7cb16
|
sbgemm: spr: add dummy source files
|
2021-10-17 19:08:03 -07:00 |
Wangyang Guo
|
3dc6052c7e
|
initial support for Sapphire Rapids platform
|
2021-10-12 01:30:40 -07:00 |
Wangyang Guo
|
ee5ca8a328
|
x86_64: BFLOAT16: fix build warning
|
2021-09-28 18:30:06 +08:00 |
Martin Kroeker
|
8dfa61a61c
|
Initialize abs_mask1 with itself to silence a gcc warning
|
2021-09-15 22:11:35 +02:00 |
Martin Kroeker
|
99aa10b3ff
|
Initialize abs_mask1 with itself to silence a gcc warning
actual initialization is via the _mm_cmpeq_ep18, which I've seen claimed to be the fastest way to set an xmm register to all 1s
|
2021-09-15 22:10:43 +02:00 |
Martin Kroeker
|
ce036a2fc0
|
Add casts
|
2021-09-14 21:41:53 +02:00 |
Martin Kroeker
|
af8843875a
|
Merge pull request #3376 from martin-frbg/issue3370
Fix a few harmless compiler warnings
|
2021-09-12 00:01:31 +02:00 |
Martin Kroeker
|
0925dfe2c9
|
One instance of kernel_4x1 is used even on SKX
|
2021-09-11 15:30:19 +02:00 |
Martin Kroeker
|
7d873a329f
|
Add ifdefs around conditionally used functions
|
2021-09-11 14:38:47 +02:00 |
Martin Kroeker
|
d17238599b
|
Add casts
|
2021-09-11 13:38:28 +02:00 |
Wangyang Guo
|
59a1114d03
|
sbgemm: cooperlake: tuning for small matrix
|
2021-09-07 21:30:46 +08:00 |
Wangyang Guo
|
682d66555d
|
sbgemm: cooperlake: implement ncopy_16
|
2021-09-07 21:30:46 +08:00 |
Wangyang Guo
|
beccb83b16
|
sbgemm: cooperlake: add n24 kernel for tcopy_4
|
2021-09-07 21:30:46 +08:00 |
Wangyang Guo
|
5fcacad32b
|
sbgemm: cooperlake: implement tcopy_4
|
2021-09-07 21:30:46 +08:00 |
Wangyang Guo
|
bb1c4fa5bd
|
sbgemm: cooperlake: prefetch A & B
|
2021-09-07 21:30:46 +08:00 |
Wangyang Guo
|
7a2d1601ec
|
sbgemm: cooperlake: unroll core loop by 2
|
2021-09-07 21:30:46 +08:00 |
Wangyang Guo
|
45fdf951b6
|
sbgemm: cooperlake: reorder ptr increase for performance
|
2021-09-07 21:30:46 +08:00 |
Wangyang Guo
|
cece3541ab
|
sbgemm: cooperlake: fix bug in m64n12
|
2021-09-07 21:30:46 +08:00 |
Wangyang Guo
|
9df0953cde
|
sbgemm: cooperlake: kernel works for NN
|
2021-09-07 21:30:45 +08:00 |
Wangyang Guo
|
2ec9f3a8aa
|
sbgemm: cooperlake: change kernel size to 16x4
|
2021-09-07 21:30:45 +08:00 |
Wangyang Guo
|
ef8f5fecc8
|
sbgemm: cooperlake: implement sbgemm_tcopy_32
|
2021-09-07 21:30:45 +08:00 |
Wangyang Guo
|
4c294336e6
|
sbgemm: cooperlake: add dummy source files
|
2021-09-07 21:30:45 +08:00 |
Wangyang Guo
|
619588fbab
|
sbgemm: remove unnecessary b0 files
|
2021-08-30 17:55:01 +08:00 |