Wangyang Guo
|
fee5abd84b
|
Small Matrix: support cmake build
|
2021-08-04 08:50:15 +00:00 |
Wangyang Guo
|
478d1086c1
|
Small Matrix: support DYNAMIC_ARCH build
|
2021-08-04 03:12:41 +00:00 |
Wangyang Guo
|
6b58bca18b
|
Small Matrix: disable low performance default kernel
|
2021-08-03 06:49:03 +00:00 |
Wangyang Guo
|
fa777f5517
|
Small Matrix: skylakex: add DGEMM_SMALL_M_PERMIT and tune for TN kernel
|
2021-08-02 07:06:54 +00:00 |
Wangyang Guo
|
8592c21af4
|
Small Matrix: skylakex: dgemm nn: fix typo in idx load
|
2021-08-02 07:06:54 +00:00 |
Wangyang Guo
|
3e79f6d89a
|
Small Matrix: skylakex: add dgemm tn kernel
|
2021-08-02 07:06:54 +00:00 |
Wangyang Guo
|
323d7da4f7
|
Small Matrix: skylakex: add dgemm tt kernel
|
2021-08-02 07:06:54 +00:00 |
Wangyang Guo
|
f57fc932ac
|
Small Matrix: skylakex: add dgemm nt kernel
|
2021-08-02 07:06:54 +00:00 |
Wangyang Guo
|
91ec21202b
|
Small Matrix: skylakex: add dgemm nn kernel
|
2021-08-02 07:06:54 +00:00 |
Wangyang Guo
|
72e070539c
|
Small Matrix: skylakex: add sgemm tt kernel
|
2021-08-02 07:06:54 +00:00 |
Wangyang Guo
|
02c6e764f2
|
Small Matrix: skylakex: add SGEMM_SMALL_M_PERMIT and tune for TN kernel
|
2021-08-02 07:06:54 +00:00 |
Wangyang Guo
|
5dc7c3c8e5
|
Small Matrix: add GEMM_SMALL_MATRIX_PERMIT to tune small matrics case
|
2021-08-02 07:06:54 +00:00 |
Wangyang Guo
|
642c393879
|
Small Matrix: skylakex: add sgemm tn kernel
|
2021-08-02 07:06:54 +00:00 |
Wangyang Guo
|
ae3f5c737c
|
Small Matrix: skylakex: sgemm nt: optimize for M < 12
|
2021-08-02 07:06:54 +00:00 |
Wangyang Guo
|
0d72d75bf9
|
Small Matrix: skylakex: add sgemm nt kernel
|
2021-08-02 07:06:54 +00:00 |
Wangyang Guo
|
ca7682e3a3
|
Small Matrix: skylakex: sgemm nn: fix n6 conflicts with n4
|
2021-08-02 07:06:54 +00:00 |
Wangyang Guo
|
9967e61abb
|
Small Matrix: skylakex: sgemm nn: fix error when beta not zero
|
2021-08-02 07:06:54 +00:00 |
Wangyang Guo
|
a87736346f
|
Small Matrix: skylakex: sgemm nn: add n6 to improve performance
|
2021-08-02 07:06:54 +00:00 |
Wangyang Guo
|
4c9d9940fd
|
Small Matrix: skylakex: sgemm nn: reduce store 4 N at a time
|
2021-08-02 07:06:54 +00:00 |
Wangyang Guo
|
13b32f69b7
|
Small Matrix: skylakex: sgemm nn: reduce store 4 M at a time
|
2021-08-02 07:06:54 +00:00 |
Wangyang Guo
|
3d8c6d9607
|
Small Matrix: skylakex: sgemm nn: clean up unused code
|
2021-08-02 07:06:54 +00:00 |
Wangyang Guo
|
49b61a3f30
|
Small Matrix: skylakex: sgemm_nn: optimize for M <= 8
|
2021-08-02 07:06:54 +00:00 |
Wangyang Guo
|
f88470323b
|
Optimize M < 16 using AVX512 mask
|
2021-08-02 07:06:54 +00:00 |
Wangyang Guo
|
9186456a12
|
small matrix: SkylakeX: add SGEMM NN kernel
|
2021-08-02 07:06:54 +00:00 |
Xianyi Zhang
|
6022e5629c
|
Refs #2587 fix small matrix c/zgemm bug.
|
2021-08-02 07:06:54 +00:00 |
Xianyi Zhang
|
57ed58cefe
|
Refs #2587 Add small matrix optimization reference kernel for c/zgemm.
|
2021-08-02 07:06:54 +00:00 |
Xianyi Zhang
|
17d32a4a82
|
Change a1b0 gemm to b0 gemm.
|
2021-08-02 07:06:54 +00:00 |
Xianyi Zhang
|
59cb5de46b
|
Refs #2587 Fix typos.
|
2021-08-02 07:06:54 +00:00 |
Xianyi Zhang
|
be3349405d
|
Add alpha=1.0 beta=0.0 for small gemm.
|
2021-08-02 07:01:47 +00:00 |
Xianyi Zhang
|
0a2077901c
|
Add small marix optimization kernel interface.
make SMALL_MATRIX_OPT=1
|
2021-08-02 07:01:47 +00:00 |
gxw
|
0b8f7c8c10
|
Add cmake support for LOONGARCH64
|
2021-08-02 10:00:41 +08:00 |
gxw
|
af0a69f355
|
Add support for LOONGARCH64
|
2021-07-27 15:29:12 +08:00 |
Martin Kroeker
|
49bbf330ca
|
Empirical workaround for numpy SVD NaN problem from issue 3318
|
2021-07-18 22:19:19 +02:00 |
Martin Kroeker
|
5b4b385ecf
|
Temporarily disable the SkylakeX sgemv_t microkernel due to LAPACK testsuite failures
|
2021-07-14 20:50:14 +02:00 |
User User-User
|
39ef0880ae
|
copy conf
|
2021-06-19 21:49:58 +02:00 |
Martin Kroeker
|
c4b464cac6
|
Merge pull request #3273 from austinpagan/sbgemm_gcc10_fix
Power10: Fix for SBGEMM
|
2021-06-15 22:58:48 +02:00 |
Gordon Fossum
|
e6dd44d989
|
Power10: Fix for SBGEMM
While testing bfloat16 sbgemm kernel, there are some failures for odd value inputs due to updating result for
additional bytes.
|
2021-06-15 13:07:47 -05:00 |
Gilles Gouaillardet
|
9d292d37b2
|
arm64: add the missing d9 register to the clobber list
Refs. numpy/numpy#18422
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
|
2021-06-14 17:01:28 +09:00 |
Martin Kroeker
|
2e8ff4a781
|
Merge pull request #3266 from martin-frbg/powerparam
Remove spurious casts from PPC parameters and fix compilation for older targets
|
2021-06-10 18:05:47 +02:00 |
Martin Kroeker
|
dbba381dc3
|
Merge pull request #3260 from intelmy/sgemv_t_opt
Optimized sgemv_t for small N based on AVX512
|
2021-06-10 16:08:24 +02:00 |
Martin Kroeker
|
efdbdd8f82
|
Add prefetch values for power3
|
2021-06-10 11:20:29 +02:00 |
Martin Kroeker
|
3906ef3b0f
|
Add prefetch values for power3
|
2021-06-10 11:19:40 +02:00 |
Martin Kroeker
|
8adf0971d8
|
Add prefetch values for power3
|
2021-06-10 11:18:22 +02:00 |
Martin Kroeker
|
08e2e60762
|
Add prefetch values for power3
|
2021-06-10 11:17:33 +02:00 |
Martin Kroeker
|
fb9e678235
|
Fix caxpy/zaxpy for big-endian
|
2021-06-10 11:15:48 +02:00 |
Martin Kroeker
|
dc4fcb48df
|
Fix inverted conditional for caxpy/zaxpy
|
2021-06-10 11:14:03 +02:00 |
Martin Kroeker
|
7a48247761
|
fix c/zrot and sgemv for POWER5
|
2021-06-10 11:11:56 +02:00 |
Rajalakshmi Srinivasaraghavan
|
cbb70438df
|
POWER10: Fixes for sbgemm kernel
While testing bfloat16 sbgemm kernel, there are some failures
for odd value inputs due to array access beyond the boundary.
|
2021-06-09 12:20:09 -05:00 |
Ma, Yu
|
706a08d4a0
|
Optimized sgemv_t for small N based on AVX512
|
2021-06-08 15:08:28 -04:00 |
Zhaofeng Li
|
590be3fae3
|
riscv64: Add Makefile
|
2021-06-07 22:55:56 +00:00 |